Data Science has been a trending field of study in various industries in today’s era, because of the tremendous increase in data generation with high velocity. Soon every business sector started building new strategies to make huge profits from the data being generated. Here comes the role of data scientist who not only drives meaningful insights out of huge and complex data but also he/she should have the strong domain knowledge, should be flexible enough to use different tools as when needed and updated with skills to remain valuable assets to their organizations. To become a professional Data Scientist, you must have a strong hold on domain knowledge, statistics, programming, data interpretation, and communication skills.
Data science competitional/informational websites to follow: Kaggle , Stanford Online , crowdAnalytix , KDnudggets , AnalyticsVidhya , EliteDataScience , statistics , simplystats
Let us see the 10 most used tools for Data Scientists in the order of their preference:
Tensorflow:
It is an open source software library works as a machine learning framework, especially used for building neural nets in machine learning applications. It was developed by Google for internal use and later released under the Apache 2.0 open source license on November 9, 2015.
Official site: https://www.tensorflow.org
Repository: https://github.com/tensorflow/tensorflow
Websites to follow: Tensorflow, learningtensorflow, Python Deep LearningTutorialspoint, pythonprogramming
Python:
It is an open-source, interpreted, high level and general purpose programming language. Programmers/Data Science practitioners having technical background prefer python as their core language for building algorithms and training the machine learning models. It was developed by Python Software Foundation a non-profit organization, designed by Guido van Rossum and released under the Python Software Foundation License in 1991.
Course: Data Scientist with Python
Official site: https://www.python.org
Documentations: Python 3.7.0 and Python 2.7.15
Websites to follow: realpython, Analytics Vidhya, Python Tutorialspoint, Python-datascience Handbook
R:
Course: Data Scientist with R
Official site: https://www.r-project.org
Documentation: R-Manuals
Websites to follow: Rbloggers , RStudio Blog
Spark – MLlib:
Apache Spark is a consolidated and in-memory analytics engine for real-time large scale data processing. It is 100x faster than Apache Hadoop and can be implemented in Scala, Java, Python, and R. It includes libraries like Spark SQL, Spark Streaming, Spark MLlib, and GraphX.
MLlib is Apache Spark’s scalable machine learning library. Machine Learning Library (MLlib) Guide
It was developed by Apache Software Foundation, UC Berkeley, AMPLab, Databricks and released under Apache License 2.0 License on May 26, 2014.
Spark API Docs:
Official site: https://spark.apache.org/mllib/
Repository: https://github.com/apache/spark
Websites to follow: Dataflair , Apache Spark Tutorialspoint, datacamp
Hadoop(MapReduce):
Hadoop is an open-source software framework, use java based MapReduce programming implementation for storing big data and running applications on clusters of commodity hardware.
Commercial distributions of Hadoop are currently offered by four primary vendors of big data platforms: Amazon Elastic MapReduce, Cloudera CDH Hadoop Distribution, Hortonworks Data Platform (HDP) and MapR Hadoop Distribution.
It was developed by Apache Software Foundation, and released under Apache License 2.0 License on December 10, 2011.
Official site: http://hadoop.apache.org
Repository:https://git-wip-us.apache.org/repos/asf?p=hadoop.git
Websites to follow: Dataflair , Hadoop Tutorialspoint , Yahoo developer Network, guru99
Amazon web services:
AWS provides an on-demand cloud computing platform and cloud-based services with paid subscription basis, for individuals and business organizations or governments. Andy Jassy is CEO of AWS.
It was released under its parent company Amazon on March 2006.
Courses:
Official site: https://aws.amazon.com/
Documentations: AWS Guides and API References
Websites to follow: AWS Tutorialspoint, cloudacademy , guru99
Jupyter Notebooks:
It is an open-source interactive web application that allows you to create and share documents which contain live code, equations, visualizations, and narrative text. It was released on 2015 by Fernando Pérez.
Official site: https://jupyter.org/index.html
Installation: Installing the Jupyter Notebook
Documentation: Jupyter Interactive Notebook
Websites to follow: codecademy, datacamp
Microsoft Azure Machine Learning:
It is a part of Cortana Intelligence Suite that enables predictive analytics. It is a collaborative workspace where one can build, test and deploy predictive models and analytical solutions on the data by just drag and drop tools. It was developed by Microsoft.
Official site: https://azure.microsoft.com/en-us/
Product: Azure Machine Learning Studio
Documentation: Azure Machine Learning Services documentation
Websites to follow: Azure Tutorialspoint, datasciencedojo, cloudacademy
Tableau:
It is a data visualization product for creating storytelling dashboards which focus on business intelligence. It was founded by Christian Chabot, Chris Stolte,
Pat Hanrahan at Mountain View, California(2003), and released as Public company in January 2003.
Official site: https://www.tableau.com
Websites to follow: Tableau Training, Tableau Tutorialspoint
SQL:
Structured Query Language which interacts with the database(RDBMS) and manages data using queries and statements. It was developed by ISO/IEC, and released under ISO/IEC 9075 License in the year 1986.
Official site: https://www.sas.com/en_in/home.html
Courses: SQL – MySQL for Data Analytics and Business Intelligence , SQL for Data Science
Websites to follow: W3Schools, SQL Tutorialspoint, SQL Cheat Sheet
SAS:
Statistical Analysis System used for advanced analytics, statistical multivariate analysis, data management, Business Intelligence, graphical data representation, and predictive data modeling.
It was developed by SAS Institute, and released under Proprietary License in the year 1976.
SAS also provides Academic Programs like Free SAS e-Learning.
Official site: www.sas.com
Documentation: SAS 9.4 and SAS Viya 3.4 Programming Documentation
Websites to follow: support.sas.com, blogs.sas.com, SAS Tutorialspoint, KentState, SAS Global Forum
Excel:
It was developed by Microsoft, and released under Trialware License in the year 1987.
Official site: office.microsoft.com
Websites to follow: MSExcel Tutorialspoint, Microsoft Excel Help Center, GCFGlobal, Excel Exposure, Contextures, Chandoo.org
Top best Certification courses on Data Science:
- Machine Learning, created by Stanford University on Coursera.
- Deep Learning Specialization by Andrew Ng’s deeplearning.ai on Coursera.
- Data Scientist Nanodegree program by Udacity.
- Machine Learning A-Z™: Hands-On Python & R In Data Science, created by Kirill Eremenko, Hadelin de ponteves, SuperDataScience Team, SuperDataScience Support
- Edx Data Science Courses by top Universities and Institutions on Edx.org
- Data Scientist Masters Program, by Simplilearn Course advisors: Ronald van Loon, Mike Tamir
- Free Online Courses in Data Science by Class Central.
- Analytics training courses by Experfy.
- Data Science Courses by FutureLearn. FutureLearn offers online maths, science and engineering courses created by experts from leading universities and organisations. Also helps learners brush up on basic science and numeracy skills or master advanced topics like robotics and forensics.