Roadmap to becoming a Data scientist

My excitement is rising day by day. The journey of a developer from mobile to data science is going to be very interesting. Let’s discuss my journey as a data scientist.

First of all, let’s discuss why Python is for data science.

We have many programming languages like Java, Kotlin,c, and c++, and machine learning is possible in other languages like java but the community acceptance for Machine learning and data science is more for python in comparison with other languages and I personally experience that python have a magical factor 🪄write less and do more. No need to write more documents and comments if you are working in python. it means no need to describe the code more, python is very simple to read.

A python developer has a lot of community support in the form of libraries for everything. For example for machine learning and data science python has NumPy, pandas, MatplotLib and Scikit-Learn.

How much python do I need to learn

Don’t be childish but most people think like this only, that how much they will learn so that it becomes enough for the particular technology. Beginning the journey of data science I think the following knowledge a Data scientist must have basic knowledge of python, understanding of object-oriented codes, exception handling, and file handling knowledge. openpyxl is a library dealing with Xls file parsing, python is used for Data-mining and software testing but here we are not discussing this.

How to create, train the model and use the model

Machine learning and Data Science is a subset of AI(artificial intelligence)Well, I am also new to discussing this, but I am discussing this to make the big picture clear. Let’s create a code to identify Cat from a picture. If we write the code in some of the conventional programming languages then it will be more complex and very much unstable for example if we change the set of pictures from colorful to back-white then your program fails to identify or if I will give you the photo of a dog or a cat from the different angle then also your program will fail to identify. To solve this problem we will take the help of Machine Learning. Before discuss let’s know the other usage of machine learning.

  1. Robotics
  2. Self-driving car
  3. Language processing
  4. vision processing
  5. Forecasting Stock Market Trends
  6. Train the model

Steps to perform the machine learning are

  1. import the data (often in CSV format)
  2. clean the data(it involve removing the duplicates.
  3. Split the data into Training/Test sets
  4. Creating the model
  5. Train the model
  6. Make predictions
  7. Evaluate and improve

Cleaning of the data is important?

cleaning of data involves removing duplicates(if we don’t remove the duplicate then it will give the wrong result). If the data is irrelevant then we have to remove them, if the data is text-based for example name of the countries, the name of cat and dog then we have to convert it into numerical tables. This step depends upon the type of project we are working on.

Split the data into. training/test sets

For example, our project is to identify Cats and Dogs then we will keep 80% images for the training and model and the remaining 20% for test sets.

Creating the model

For creating the model we have to select the algorithm for data analysis We have various algorithms to create the model for example decision trees and neural networks. Each of the algorithms has a thousand pros and cons but the selection of the algorithm depends on the nature of the project and the input of data. we don’t have to explicit program the algorithm we have thousand of library and one of the most popular library for this is Scikit-Learn

Prediction of the model and evaluation and improve

check the trained model is working fine or not, if it is not working fine or not, and if it is not working fine then we have to evaluate the model

Nowadays Data-science is used almost everywhere and it has more scope in the future, in almost all the filed for example in Banking, Medical, and many more by reading this blog a person will have a rough understanding of data science if he/she wants to learn.

happy reading 😃



