Professional Documents
Culture Documents
Along with the rise of Big Data technology, new terms have evolved like Data Analyst, Data
Scientist. Ever wondered what are the differences between these terms? After all, these terms
have the common word- data. Well, the two terms are similar yet have some differences. In
this blog, I will bring out this subtlety.
Data Analyst
A Data Analyst is someone who analyzes large data sets, draws inferences from them, and
projects this to the management using reporting tools. A Data Analyst usually has a degree
in Computer Science or MBA and additionally needs to possess the following technical skills:
Able to use statistical programming languages like R, STATA, and SAS to manipulate data.
Have knowledge of programming languages like Python, or Ruby for web development or
familiar with HTML and Java Scripting for front end development to present data.
Ability to use open source tools like Hadoop, Hive, Pig, Impala, and HBase – to improve
productivity for analysis tasks.
Precisely, Data Analysts are people who can convert numbers in data into English sentences.
This helps businesses to strategize. The challenge in presenting to management is even
though analysis is done with statistical methods and terms, the presentation should be in
business terminology-implying that a Data Analyst should have good communication skills
too. Even though many areas are mentioned above, a Data Analyst need not attempt to
master all of them – he or she can specialize in any one area. This leads us to the question:
Unsupervised learning
Reinforcement learning.
In supervised learning, a computer program is provided with two sets of data, a training set
and a test set. The computer uses the set of labeled examples in the training set to learn and
identify unlabeled examples in the test set accurately. The computer program ultimately
creates a rule and uses it on the test set. This is the type of program that sits in your phone
and recognizes your voice.
There are specific tools that are used for this purpose. They are: decision trees, Naive Bayes
classification, Ordinary Least Squares regression etc.
Lastly, the learning which falls between the above two methods of learning is reinforcement
learning. Here as the name implies, the computer has to determine the result in a specific
context. Some of the tools you’ll need to use are: Q-Learning, TD-Learning, and and genetic
algorithms.
5. Data Wrangling, Visualization, and Intuition: To collect, organize and analyze data,
you need to equip yourself with knowledge of SQL querying, Hadoop, Spark, MongoDB. After
collecting and organizing data, you should know how to present it visually to stakeholders.
Knowing tools like ggplot, matplotlib etc. will help you in doing so. Apart from these, you
should have innate ability to know which data sets to consider and which data sets to leave
out.
Starting with Programming background
If you are a software engineer or studied programming languages in college, here are the
things you have to learn before applying for the role of a Data Analyst:
1. Statistics: You should have the statistical skills mentioned above – be able to make
statistical inferences, identify patterns, compare data sets, apply the right techniques.
2. Math: Linear algebra, matrices, calculus and ability to solve equations are the basic skills
needed to manipulate data and represent it as graphs and reports.
Data Scientists choose the tools based on the field and context in which they work. The
specific skills that Data Scientists have are:
1. Expertise in math and statistics – to select the right algorithm to apply and derive models.
In short a Data Scientist should be an expert in: Math, Statistics, technology, and business. But
in reality one person being an expert in all the areas is not possible. So, there are Data Science
teams with team members having an expertise in one area but being able to talk to any other
team member with expertise in another skill.
The combination of expertise in these areas is what places a Data Scientist above a Data
Analyst. But it also means that a Data Analyst can grow into a successful Data Scientist.
How to become a Data Scientist is the next obvious question. Apart from equipping oneself
with a degree in statistics or math, the simple steps or basic steps to be taken are, to get
trained in:
Apache Mahout.
Conclusion
No doubt that Data analysis is a mushrooming field. If you are about to embark on a career
in Data Analysis, the skills listed above are the building blocks and learning these skills does
mandate investment but the payoffs are promising indeed!