So you decided to get onto the Data Bandwagon by giving Data related careers a shot! Despite, plenty of information online, the challenge with Data Roles is understanding the different types of roles, how to think about it and execute the plan of action. This Data Analytics RoadMap starts with a simple question!
Why do you want to get into Data?
Thinking about this question is good, since this will help you stay motivated and will help keep you going! Data related roles are not different from any other role except that we heavily use Data to
- Identify Problems: Data Analytics
- Setup Data Infrastructure : Data Engineering
- Solve Problems : Data Science Models & Deployment
The above differentiation is not hard and fast, since there is plenty of overlap between skillsets & tools but this split is good to help us create a picture and figure out out interest areas. We will primarily focusing on Data Analytics in this RoadMap.
Identifying Problems
To be able to identify problems, a firm needs to know it’s goals. These at the very high level are questions or problem statement like:
- How to increase sales?
- How to increase margins?
To be able to identify these problems, we need to know the current sales, how it’s distributed across different products. Similarly, we need to which products have what margins?
Why SQL
To be able to just get these data points, in a mature business or even a fairly small one, one will need a system of records of sales and the prices at which the products were sold. This information is typically stored through Data Bases and we will need to retrieve some records to be able to answer these questions at the start!
To fetch the records, we need to use some commands to select the right columns. In some cases, we need sales for just the last year, so need to filter these records and so on so forth!
Given this needs, we will need to be really good at SQL to able to get accurate data, filter, group the right records.
SQL Functions & Plan (1-2 Weeks)
SQL commands can be learned at multiple places like W3 schools. But make sure that you understand these specific subtopics like:
- SELECT, WHERE, HAVING, GROUP BY
- Aggregator Functions Like AVG, SUM, MEAN
- JOINs: LEFT, RIGHT, INNER
- Window Functions like RANK, PARTITION BY. This is the part where it gets difficult but needs to be covered and understand well. LAG, LEAD etc are some functions needed to compute operations at different levels.
- CTEs : Common Table Expressions are virtual tables created within the Query to create temporary tables that we can reference later to create
If we are familiar and able to comfortably able to use these, we should be good to proceed ahead.
Python/R (1-2 Weeks)
Now that we are able to pull data required, Python/R will help us wrangle and do operations on these datasets. This is optional for Data Analytics folks but still good to have. Lot’s of complex logics can be difficult to make work with SQL and knowing these gives us flexibility to use them to more intricate processing of data.
Key Topics to be covered and well understood:
- Lists : This is the basic foundation of data, all subsequent data types is built over it.
- Data Types: Strings, numeric etc
- DataFrames : This is the building block of most data operations especially tabular data.
- Pandas & Now we have more scalable libraries such as Polars in Python
- Loops: For, while and so on …
- SQL Connectors: These packages allow us to query SQL tables directly, which can be used to pull specific normalised tables directly into Python.
- Data Visualisation: We need to know how to make plots of the data we retrieved, cleaned to understand it better.
Final Piece: Projects/Portfolios (2-4 Weeks)
Once we have learned the basics of SQL & Python, we are good to start practising projects. This is an iterative process in the sense that once we learn, how to do projects, we might need to learn more things from SQL, Python to do more complex ones.
There are multiple ways to skin a cat and knowing what needs to be done using SQL and what needs Python etc is something we will figure out over time. This is the heart of the Data Analytics skill set and a clear Roadmap in terms of problems etc is not available for practice anywhere since no company will expose it’s Data Base to any one outside.
Now, while we have online tools and places to do both Python/SQL individually, it’s only through real world projects that we can get the exact flavour of how real world problem solving looks like!
Some places where we can pick up problems is:
- Kaggle: The hub of data science & analytics problems
- Github: Push your notebooks, SQL files etc to github repositories
If you need on how to act on this plan and go about it, reach out at admin[at]startupanalytics.in