Data Science has been a trending field of study in various industries in today’s era, because of the tremendous increase in data generation with high velocity. Soon every business sector started building new strategies to make huge profits from the data being generated. Here comes the role of data scientist who not only drives meaningful insights out of huge and complex data but also he/she should have the strong domain knowledge, should be flexible enough to use different tools as when needed and updated with skills to remain valuable assets to their organizations. To become a professional Data Scientist, you must have a strong hold on domain knowledge, statistics, programming, data interpretation, and communication skills.

Data science competitional/informational websites to follow: Kaggle , Stanford Online , crowdAnalytix , KDnudggets , AnalyticsVidhya , EliteDataScience , statistics , simplystats

Let us see the 10 most used tools for Data Scientists in the order of their preference:


It is an open source software library works as a machine learning framework, especially used for building neural nets in machine learning applications. It was developed by Google for internal use and later released under the Apache 2.0 open source license on November 9, 2015.

Official site:


Websites to follow: Tensorflow, learningtensorflow, Python Deep LearningTutorialspoint, pythonprogramming


It is an open-source, interpreted, high level and general purpose programming language. Programmers/Data Science practitioners having technical background prefer python as their core language for building algorithms and training the machine learning models. It was developed by Python Software Foundation a non-profit organization, designed by Guido van Rossum and released under the Python Software Foundation License in 1991.

Course: Data Scientist with Python

Official site:

Documentations: Python 3.7.0 and Python 2.7.15

Websites to follow: realpython, Analytics VidhyaPython Tutorialspoint, Python-datascience Handbook


It is open-source, graphics supported and statistical programming language. It was developed by R Core Team, designed by Ross Ihaka and Robert Gentleman and released under the GNU GPL 
v2 License on August 1993.

Course: Data Scientist with R

Official site:

Documentation: R-Manuals

Websites to follow: Rbloggers , RStudio Blog

Spark – MLlib:

Apache Spark is a consolidated and in-memory analytics engine for real-time large scale data processing. It is 100x faster than Apache Hadoop and can be implemented in Scala, Java, Python, and R. It includes libraries like Spark SQL, Spark Streaming, Spark MLlib, and GraphX.
MLlib is Apache Spark’s scalable machine learning library. Machine Learning Library (MLlib) Guide

It was developed by Apache Software Foundation, UC Berkeley, AMPLab, Databricks and released under Apache License 2.0 License on May 26, 2014.

Spark API Docs:

  1. Spark Python API Doc
  2. SparkR API Doc
  3. Spark Scala API Doc
  4. Spark Java API Doc
  5. Spark Sql API Doc

Official site:


Websites to follow: Dataflair , Apache Spark Tutorialspoint, datacamp


Hadoop is an open-source software framework, use java based MapReduce programming implementation for storing big data and running applications on clusters of commodity hardware.

Commercial distributions of Hadoop are currently offered by four primary vendors of big data platforms: Amazon Elastic MapReduce, Cloudera CDH Hadoop Distribution, Hortonworks Data Platform (HDP) and MapR Hadoop Distribution.
It was developed by Apache Software Foundation, and released under Apache License 2.0 License on December 10, 2011.

Official site:


Websites to follow: Dataflair , Hadoop Tutorialspoint , Yahoo developer Network, guru99

Amazon web services:

AWS provides an on-demand cloud computing platform and cloud-based services with paid subscription basis, for individuals and business organizations or governments. Andy Jassy is CEO of AWS.
It was released under its parent company Amazon on March 2006.


  1. Machine Learning on AWS
  2. Data Lakes and Analytics on AWS
  3. Lynda-AWS ML essential training

Official site:

Documentations: AWS Guides and API References

Websites to follow: AWS Tutorialspoint, cloudacademy , guru99

Jupyter Notebooks:

It is an open-source interactive web application that allows you to create and share documents which contain live code, equations, visualizations, and narrative text. It was released on 2015 by Fernando Pérez.

Official site:

Installation: Installing the Jupyter Notebook

Documentation: Jupyter Interactive Notebook

Websites to follow: codecademy, datacamp

Microsoft Azure Machine Learning:

It is a part of Cortana Intelligence Suite that enables predictive analytics. It is a collaborative workspace where one can build, test and deploy predictive models and analytical solutions on the data by just drag and drop tools. It was developed by Microsoft.

Official site:

Product: Azure Machine Learning Studio

Documentation: Azure Machine Learning Services documentation

Websites to follow: Azure Tutorialspoint, datasciencedojo, cloudacademy


It is a data visualization product for creating storytelling dashboards which focus on business intelligence. It was founded by Christian Chabot, Chris Stolte,
Pat Hanrahan at Mountain View, California(2003), and released as Public company in January 2003.

Official site:

Websites to follow: Tableau Training, Tableau Tutorialspoint


Structured Query Language which interacts with the database(RDBMS) and manages data using queries and statements. It was developed by ISO/IEC, and released under ISO/IEC 9075 License in the year 1986.

Official site:

Courses: SQL – MySQL for Data Analytics and Business Intelligence , SQL for Data Science

Websites to follow: W3Schools, SQL Tutorialspoint, SQL Cheat Sheet


Statistical Analysis System used for advanced analytics, statistical multivariate analysis, data management, Business Intelligence, graphical data representation, and predictive data modeling.
It was developed by SAS Institute, and released under Proprietary License in the year 1976.
SAS also provides Academic Programs like Free SAS e-Learning.

Official site:

Documentation: SAS 9.4 and SAS Viya 3.4 Programming Documentation

Websites to follow:,, SAS Tutorialspoint, KentState, SAS Global Forum


Microsoft Excel is a spreadsheet used for calculations, pivot tables, graphical representation tools, and supports macro programming language known as visual basic for applications.
It was developed by Microsoft, and released under Trialware License in the year 1987.

Official site:

Websites to follow: MSExcel Tutorialspoint, Microsoft Excel Help Center, GCFGlobal, Excel Exposure, Contextures,

Top best Certification courses on Data Science:

  1. Machine Learning, created by Stanford University on Coursera.
  2. Deep Learning Specialization by Andrew Ng’s on Coursera.
  3. Data Scientist Nanodegree program by Udacity.
  4. Machine Learning A-Z™: Hands-On Python & R In Data Science, created by Kirill Eremenko, Hadelin de ponteves, SuperDataScience Team, SuperDataScience Support
  5. Edx Data Science Courses by top Universities and Institutions on
  6. Data Scientist Masters Program, by Simplilearn Course advisors: Ronald van Loon, Mike Tamir
  7. Free Online Courses in Data Science by Class Central.
  8. Analytics training courses by Experfy.
  9. Data Science Courses by FutureLearn. FutureLearn offers online maths, science and engineering courses created by experts from leading universities and organisations. Also helps learners brush up on basic science and numeracy skills or master advanced topics like robotics and forensics.

Some Bootcamps for Data Science:

Leave a Reply

Your email address will not be published.