May 19, 2020 by Vishesh Agrawal
Data scientist is one of the hottest jobs in IT sector. So, what exactly a data scientist? The professional who work with data and finds the useful insights.
As data science domain includes, statistics, calculus, database management, data visualization, programming, software engineering, domain knowledge, etc. And your job starting from the data collection (i.e., where to pick or scrape data on your own), come up with your hypothesis, perform exploratory analysis (i.e., extract some interesting insights), build your machine learning model and lastly share your findings from write up or presentations. So, at a time of staring the journey of became a data scientist, the first question is where to start. I suggest to start from programming language.
R have many inbuilt libraries for statistics, data visualization etc. so, it is one of the good choice. I personally prefer Python. As a Python is generic language and it have many inbuilt libraries for data analysis, data visualization etc. Also, many libraries and frameworks of same domain are built on top of Python or support to Python. So, in this blog, I will mention the libraries who support to Python. Also, for data extraction and manipulation from the relational databases, SQL is the fundamental language used in almost anywhere.
After selecting your programming language, catching up on the basic math related to statistics, calculus and linear algebra is a good choice. This is essential as a data scientist to understand the mechanisms behind how different algorithms work. It builds intuition about how to tweak or modify algorithms for solving unique business problems. Also, knowing the statistics helps you to convert your findings from the experimental design tests (i.e., A/B testing) into key business metrics. Also, use libraries base on vectorization for calculus because they work faster eg. Numpy. For storage use the use a libraries who support to DataFrame like Pandas.
From here, you start with data formation (i.e., import data, aggregation, pivoting data and missing value treatment) and data visualization (i.e., bar charts, histograms, pie chart, heat map, and map visualizations). For data visualization MatPlotLib and Seaborn is good libraries. Data visualization give you lots of information about the given data.
Now its time to build the Model. For model formation I prefer TensorFlow2 and Keras. The sequential model of Keras are easy to start. It will give basic understanding of TensorFlow2 and Keras also you enjoy with building your own model. Also it is good to understand well known networks architecture like LeNet-5.
Finally you have to showcase your potential as a data scientist candidate. Once you familiarize yourself with doing the data science, one must have a project portfolio. A project portfolio is your best opportunity to show what you have done from learning and work experiences.
In my case, I have done both a write-up and a video podcast by working on the capstone project with an assigned mentor. I can never emphasize enough about the importance of having a mentor who can directly work with you one on one. Your mentor is the best friend to guide you and ask for help when you get stuck on some project ideas, tuning your model, communicating your results, etc. In fact, some researches mentioned that having a mentor can boost your career five times more than people without a mentor.
Note: We VchipAI provides certified courses cum Internship. You can get more about VchipAI at http://ai.vchiptech.com.