You are on page 1of 5

Data Analyst vs Data Scientist?

Along with the rise of Big Data technology, new terms have evolved like Data Analyst, Data
Scientist. Ever wondered what are the differences between these terms? After all, these terms
have the common word- data. Well, the two terms are similar yet have some differences. In
this blog, I will bring out this subtlety.

Data Analyst
A Data Analyst is someone who analyzes large data sets, draws inferences from them, and
projects this to the management using reporting tools. A Data Analyst usually has a degree
in Computer Science or MBA and additionally needs to possess the following technical skills:

 Have basic knowledge of statistics.

 Able to use statistical programming languages like R, STATA, and SAS to manipulate data.

 Have knowledge of programming languages like Python, or Ruby for web development or
familiar with HTML and Java Scripting for front end development to present data.

 Know SQL querying.

 Knowledge of Excel can be useful but Excel is an old tool now.

 Ability to use open source tools like Hadoop, Hive, Pig, Impala, and HBase – to improve
productivity for analysis tasks.

Precisely, Data Analysts are people who can convert numbers in data into English sentences.
This helps businesses to strategize. The challenge in presenting to management is even
though analysis is done with statistical methods and terms, the presentation should be in
business terminology-implying that a Data Analyst should have good communication skills
too. Even though many areas are mentioned above, a Data Analyst need not attempt to
master all of them – he or she can specialize in any one area. This leads us to the question:

How to become a Data Analyst?


There are 3 starting points:
1. Starting with no knowledge of programming and math.

2. Starting with programming background.


3. Starting with strong mathematical background

Here is a step by step guide to upscale.

Starting with no knowledge of Programming and


Math
1. Programming: is a core skill needed for Data Analysts. This is the skill that differentiates a
Data Analyst from a Business Analyst. You need to learn programming languages like java,
R, or Python and a good understanding of the data science libraries like ggplot2, gplot2,
reshape2, pandas etc.
2. Statistics: For you to be able to analyze data, you have to familiarize with Descriptive and
Inferential statistics. Descriptive-helps you analyze data and describe it in a meaningful way
and Inferential-help in predictive measures that infer properties of the larger data set by
interpreting the sample. For example: You can identify patterns emerging from data with this
method of analysis. You may already know some of the basics of descriptive statistics from
school like–mean, median, mode, standard deviation and variance, etc. Then you need to
learn more about the complex statistical skills like comparing different samples with different
types of data distribution: standard normal, exponential/poisson, binomial, chi-square; and
tests for significance: Z-test, t-test, Mann-Whitney U, chi-squared, ANOVA). As a Data Analyst
you’ll need to know how many samples to collect, how different factors should be applied
internally, how to choose good control and testing groups, and so on.
3. Math: A strong foundation in math is essential as the data usually is interpreted in numbers.
You need to learn linear Algebra, Matrices and Calculus, and then be able to tackle the
challenge to express the real life/business problems in terms of numbers – for this you will
need to be able to manipulate algebraic expressions and solve equations. Finally, you should
be able to represent data as graphs of functions and highlight the relationship between
graphs.
4. Machine Learning: You should know the common algorithms of machine learning. For a
career as a data analyst, you won’t need to invent new machine-learning algorithms (such
advanced skills like that are needed to become a data scientist), but you should know the
most common of them. A few examples include principal component analysis, neural
networks, support vector machines, and k-means clustering. It is not mandatory to not know
the detailed theory and implementation details of these algorithms, but you should
understand the pros and cons, as well as when to (and when not to) apply them to a dataset.

There are three main types of machine learning:


 Supervised learning,

 Unsupervised learning

 Reinforcement learning.

In supervised learning, a computer program is provided with two sets of data, a training set
and a test set. The computer uses the set of labeled examples in the training set to learn and
identify unlabeled examples in the test set accurately. The computer program ultimately
creates a rule and uses it on the test set. This is the type of program that sits in your phone
and recognizes your voice.
There are specific tools that are used for this purpose. They are: decision trees, Naive Bayes
classification, Ordinary Least Squares regression etc.

In Unsupervised learning a type of machine learning algorithm is used to draw inferences


from datasets consisting of input data without labelled or known responses. The most
common unsupervised learning method is cluster analysis, which is used for exploratory
data analysis to find hidden patterns. This is the algorithm applied by Netflix to recommend
movies and Flipkart to predict products that you like.
The specific tools to be used in unsupervised learning are: clustering algorithms, Principal
Component Analysis (PCA), Singular Value Decomposition (SVD) etc.

Lastly, the learning which falls between the above two methods of learning is reinforcement
learning. Here as the name implies, the computer has to determine the result in a specific
context. Some of the tools you’ll need to use are: Q-Learning, TD-Learning, and and genetic
algorithms.
5. Data Wrangling, Visualization, and Intuition: To collect, organize and analyze data,
you need to equip yourself with knowledge of SQL querying, Hadoop, Spark, MongoDB. After
collecting and organizing data, you should know how to present it visually to stakeholders.
Knowing tools like ggplot, matplotlib etc. will help you in doing so. Apart from these, you
should have innate ability to know which data sets to consider and which data sets to leave
out.
Starting with Programming background
If you are a software engineer or studied programming languages in college, here are the
things you have to learn before applying for the role of a Data Analyst:

1. Statistics: You should have the statistical skills mentioned above – be able to make
statistical inferences, identify patterns, compare data sets, apply the right techniques.
2. Math: Linear algebra, matrices, calculus and ability to solve equations are the basic skills
needed to manipulate data and represent it as graphs and reports.

Starting with strong Mathematical background


If you are a Mathematical whizz kid and aspire to be a Data Analyst, you need to acquire the
following programming skills:

1. Basic programming: Variables, loops, functions, control flow etc.


2. Object Oriented Programming: Learn to design your program so that is based on
Object Oriented patterns and is easy to develop, test, and maintain.
3. Data Structures: Learn Arrays, Stacks, Queues, Lists, and Graphs.
4. Software Design Patterns: Many robust software design patterns are available – learn
these design patterns.
5. Algorithms: Learn which algorithms need to be applied to solve which kind of problems.
This knowledge makes a huge difference to how long your data analysis takes to
produce useful results.

Data Scientist is a statistician and a software


engineer rolled into one.
What does a Data Scientist do?
 First and foremost, when a business problem like customer retention or reducing costs
is presented to a Data Scientist, he or she helps in solving that problem using data
intensive ways. Usually during the process of solving those problems, some insights are
discovered and inferred from the data sets.

 Parallelize and iterate as fast as possible on the problem to be solved.


 Build Data products like Dashboards, machine learning models and tools that others can
use to analyze data.

Data Scientists choose the tools based on the field and context in which they work. The
specific skills that Data Scientists have are:

1. Expertise in math and statistics – to select the right algorithm to apply and derive models.

2. Ability to use machine learning to make predictions

3. Knowledge of ‘R’or Python – to do analysis and build models.

4. Applying machine learning algorithms

5. Sharp business acumen.

In short a Data Scientist should be an expert in: Math, Statistics, technology, and business. But
in reality one person being an expert in all the areas is not possible. So, there are Data Science
teams with team members having an expertise in one area but being able to talk to any other
team member with expertise in another skill.

The combination of expertise in these areas is what places a Data Scientist above a Data
Analyst. But it also means that a Data Analyst can grow into a successful Data Scientist.

How to become a Data Scientist is the next obvious question. Apart from equipping oneself
with a degree in statistics or math, the simple steps or basic steps to be taken are, to get
trained in:

 Hadoop/Big Data programming.

 Hive, Pig, and Impala.

 Data Science & Business Applications of Data Science.


 Fundamentals of Machine Learning.

 Apache Mahout.

Conclusion
No doubt that Data analysis is a mushrooming field. If you are about to embark on a career
in Data Analysis, the skills listed above are the building blocks and learning these skills does
mandate investment but the payoffs are promising indeed!

You might also like