You are on page 1of 4

3/3/2018 What's The Best Path To Becoming A Data Scientist?

 / Tech / #IfIOnlyKnew BETA


JAN 20, 2017 @ 02:15 PM 68,981 

What's The Best Path To Becoming A Data Scientist?

Quora, CONTRIBUTOR
FULL BIO 
Opinions expressed by Forbes Contributors are their own.

(Photo by Drew Angerer/Getty Images)

How can I become a data scientist? originally appeared on Quora - the knowledge sharing network where
compelling questions are answered by people with unique insights.

Answer by Monica Rogati, Data Science advisor, formerly the VP of Data at Jawbone and
LinkedIn, on Quora:

There’s a lot of interest in becoming a data scientist, and for good reasons: high impact, high job satisfaction,
high salaries, high demand. A quick search yields a plethora of possible resources that could help -- MOOCs,
blogs, Quora answers to this exact question, books, Master’s programs, bootcamps, self-directed curricula,
articles, forums and podcasts. Their quality is highly variable; some are excellent resources and programs,
some are click-bait laundry lists. Since this is a relatively new role and there’s no universal agreement on
what a data scientist does, it’s difficult for a beginner to know where to start, and it’s easy to get
overwhelmed.

Many of these resources follow a common pattern: 1) Here are the skills you need and 2) Here is where you
learn each of these. Learn Python from this link, R from this one; take a machine learning class and “brush

https://www.forbes.com/sites/quora/2017/01/20/whats-the-best-path-to-becoming-a-data-scientist/#1b6889e937d2 1/4
3/3/2018 What's The Best Path To Becoming A Data Scientist?

up” on your linear algebra. Download the iris data set and train a classifier (“learn by doing!”) Install Spark
and Hadoop. Don’t forget about deep learning—work your way through the TensorFlow tutorial (the one for
ML beginners, so you can feel even worse about not understanding it). Buy that old orange Pattern
BETA
Classification book to display on your desk after you gave up two chapters in.

This makes sense; our educational institutions trained us to think that’s how you learn things. It might
eventually work, too, but it’s an unnecessarily inefficient process. Some programs have capstone projects
(often using curated, clean data sets with a clear purpose, which sounds good but it’s not). Many recognize
there’s no substitute for ‘learning on the job’—but how do you get that data science job in the first place?

Instead, I recommend building up a public portfolio of simple but interesting projects. You will learn
everything you need in the process, perhaps even using all the resources above. However, you will be highly
motivated to do so and will retain most of that knowledge, instead of passively glossing over complex
formulas and forgetting everything in a month. If getting a job as a data scientist is a priority, this portfolio
will open many doors, and if your topic, findings or product are interesting to a broader audience, you’ll have
more incoming recruiting calls than you can handle.

Here are the steps I recommend. They are optimized for maximizing your learning and chances to get a
data job.

1. Pick a topic you’re passionate or curious about.

Cats, fitness, startups, politics, bees, education, human rights, heirloom tomatoes, labor markets. Research
what datasets are available out there, or datasets you could create or obtain with minimal effort and expense.
Perhaps you already work at a company that has unique data, or perhaps you can volunteer at a nonprofit
that does. The goal is to answer interesting questions or build something cool in a week (it will take longer,
but this will steer you towards something manageable).

Did you find enough to start digging in? Are you excited about the questions you could ask and curious about
the answers? Could you combine this data with other datasets to produce original insights that others have
not explored yet? Census data, zip-code or state level demographic data, weather and climate are popular
choices. Are you giddy about getting started? If your answer is ‘meh’ or this feels like a chore already, start
over with a different topic.

2. Write the tweet first.

(A 21st century, probabilistic take on the scientific method, inspired by Amazon’s “write the press release
first” practice and, more broadly, the Lean Startup philosophy)

You’ll probably never actually tweet this, and you probably think tweets are a frivolous avenue to
disseminate scientific findings. But it’s essential that you write 1-2 sentences about your (hypothetical)
findings before you start. Be realistic (especially about being able to do this in a week) and optimistic (about
actually having any findings, or them being interesting). Think of a likely scenario; it won’t be accurate (you
can make things up at this point), but you’ll know if this is even worth pursuing.

Here are a few examples, with a conversational hook thrown in:

“I used LinkedIn data to find out what makes entrepreneurs different -- it turns out they’re older
than you think, and they tend to major in physics but not in nursing or theology. I guess it’s hard to
get VC funding to start your own religion.”

“I used Jawbone data to see how weather affects activity levels -- it turns out people in NY are less
sensitive to weather variations than Californians. Do you think New Yorkers are tougher or just

https://www.forbes.com/sites/quora/2017/01/20/whats-the-best-path-to-becoming-a-data-scientist/#1b6889e937d2 2/4
3/3/2018 What's The Best Path To Becoming A Data Scientist?

work out indoors?”

“I combined BBC obituary data with Wikipedia entries to see if 2016 was as bad as we thought for
celebrities.” BETA

If your goal is to learn particular technologies or get a job, add them in.

From Shelby Sturgis: “I built a web application to help teachers and administrators improve the
quality of student education by providing analytics on school rank, progress on test scores over
time, and performance in different subject areas. I used MySQL, Python, Javascript, Highcharts.js,
and D3.js to store, analyze, and visualize California STAR testing data.”

“I’ve used TensorFlow to automatically colorize and restore black and white photos. Made this giant
collage for Grandma -- best Christmas ever!”

Imagine yourself repeating this over and over at meetups and job interviews. Imagine this in USA Today or
story or Wall Street Journal (without the exact technologies; a vague “algorithm” or “AI” will do). Are you
boring yourself and having trouble explaining it, or do you feel proud and smart? If the answer is “meh”,
repeat step 2 (and possibly 1) until you have 2-3 compelling ideas. Get feedback from others -- does this
sound interesting? Would you interview somebody who built this for a data job?

Remember, at this point you have not written any code or done any of the data work yet, beyond researching
datasets and superficially understanding which technologies and tools are in demand and what they do,
broadly speaking. It’s much easier to iterate at this stage. It sounds obvious, but people are eager to jump
into a random tutorial or class to feel productive and soon sink months into a project that is going nowhere.

3. Do the work.

Explore the data. Clean it. Graph it. Repeat. Look at the top 10 most frequent values for each column. Study
the outliers. Check the distributions. Group similar values if it’s too fragmented. Look for correlations and
missing data. Try various clustering and classification algorithms. Debug. Learn why they worked or didn’t
on your data. Build data pipelines on AWS if your data is big. Try various NLP libraries on your unstructured
text data. Yes, you might learn Spark, numpy, pandas, nltk, matrix factorization and TensorFlow - not to
check a box next to a laundry list, but because you need it to accomplish something you care about. Be a
detective. Come up with new questions and unexpected directions. See if things make sense. Did you find a
giant issue with how the data was collected? What if you bring in another data set? Ride the data wave. This
should feel exciting and fun, with the occasional roadblock. Get help and feedback online, from Kaggle, from
mentors if you have access to them, or from a buddy doing the same thing. If this does not feel like fun, go
back to step 1. If the thought of that makes you hate life, reconsider being a data scientist: this is as fun as it
gets, and you won’t be able to sustain the hard work and the 80% drudgery of a real data job if you don’t find
this part energizing.)

4. Communicate.

Write up your findings in simple language, with clean, compelling visualizations that are easy to grasp in
seconds. You’ll learn several data viz tools in the process, which I highly recommend (it’s an underrated
investment in your skills). Have a clean, interesting demo or video if you built a prototype. Technical details
and code should be a link away. Send it around and get feedback. This being public will hold yourself to a
higher standard and will result in good quality code, writing and visualizations.

Now, do it all again. Congratulations, you’ve learned a lot about the latest technologies and you now have a
portfolio of compelling projects. Send a link to the hiring manager on your dream data science team. When
you get the job, send me a Sterling Truffle Bar.

https://www.forbes.com/sites/quora/2017/01/20/whats-the-best-path-to-becoming-a-data-scientist/#1b6889e937d2 3/4
3/3/2018 What's The Best Path To Becoming A Data Scientist?

This question originally appeared on Quora. - the knowledge sharing network where compelling questions
are answered by people with unique insights. You can follow Quora on Twitter, Facebook, and Google+.
More questions:
BETA

Data Analysis: For someone who wants to become better at analyzing data, where would you
recommend starting?

Jobs and Careers in Data Science: What characteristics make for a good data scientist?

Data Science: Which movie / TV series has the best depiction of Data scientist?

https://www.forbes.com/sites/quora/2017/01/20/whats-the-best-path-to-becoming-a-data-scientist/#1b6889e937d2 4/4

You might also like