Professional Documents
Culture Documents
17
Tools, Trends, What Pays (and What Doesnt) for Data Professionals in Europe
London
Beijing
Make Data Work
strataconf.com
Presented by OReilly and Cloudera, Strata + Hadoop World
helps you put big data, cutting-edge data science, and new
Editor: Shannon Cutt While the publisher and the authors have used good faith efforts to
Designer: Ellie Volckhausen ensure that the information and instructions contained in this work
Production Editor: Shiny Kalapurakkel are accurate, the publisher and the authors disclaim all responsibility
Copyright 2016 OReilly Media, Inc. All rights reserved. for errors or omissions, including without limitation responsibility for
damages resulting from the use of or reliance on this work. Use of the
Printed in Canada.
information and instructions contained in this work is at your own risk.
Published by OReilly Media, Inc., 1005 Gravenstein Highway North, If any code samples or other technology this work contains or describes
Sebastopol, CA 95472. is subject to open source licenses or the intellectual property rights of
OReilly books may be purchased for educational, business, or sales others, it is your responsibility to ensure that your use thereof complies
promotional use. Online editions are also available for most titles with such licenses and/or rights.
(http://safaribooksonline.com). For more information, contact our
corporate/institutional sales department: 800-998-9938
or corporate@oreilly.com.
2017-02-10. First Edition
ISBN: 978-1-491-97750-7
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Table of Contents
VII
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
HERE WE TAKE A DEEP DIVE YOU CAN PRESS ACTUAL BUTTONS (and earn our sincere
INTO THE RESULTS FROM
RESPONDENTS BASED IN gratitude) by taking the 2017 surveyit only takes about 5 to 10 minutes,
EUROPE, EXPLORING CAREER and is essential for us to continue to provide this kind of research.
DETAILS AND FACTORS THAT
INFLUENCE SALARY oreilly.com/ideas/take-the-2017-data-science-salary-survey
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Executive Summary
IN 2016, OREILLY MEDIA CONDUCTED A DATA SCIENCE Among those who use R or Python, users of both
SALARY SURVEY ONLINE. The survey contained 40 have the highest salaries
questions about the respondents roles, tools, compensation, A few technical tasks correlate with higher
and demographic backgrounds. About 1,000 data scientists, salaries: developing prototype
analysts, engineers, and other profession- models, setting up/maintaining
als working in Data participated in the data platforms, and developing
survey359 of them from European Respondents who use products that depend on real-time
countries. Here, we
Hadoop, Spark, or analytics
take a deep dive into the results from
espondents who use Hadoop,
R
respondents based in Europe, explor- Python were twice as
Spark, or Python were twice as
ing career details and factors that
likely to have a major likely to have a major increase in
influence salary. Some key findings
salary over the last three years,
include: increase in salary over compared with those whose
Most of the variation in salaries the last three years. stack consists of Excel and
can be attributed to differences in relational databases
the local economy
We hope that these findings will be
D ataprofessionals who use Hadoop and useful as you develop your career in data science.
Spark earn more
1
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Introduction
SINCE 2013, WE HAVE CONDUCTED AN ONLINE SALARY respondents are paid in other currencies, such as pounds or
SURVEY FOR DATA PROFESSIONALS and published a rubles. Over the period in which responses were collected,
report on our findings. US respondents typically dominate there were some important shifts in exchange rates, most
the sample, at about 60%70%. Although many of the notably the fall of the pound after Brexit. However, the
findings do appear to apply to people across the globe, we geographical distribution of responses did not correlate in any
thought it would be useful to show results specific to Europe, meaningful way with any period of collection (e.g., when the
looking at finer geographical details and identifying any patterns pound was high or low), so these currency fluctuations likely
that seem to only apply to Europe. In this report, we pool all translate into noise rather than bias.
359 European respondents from the Data Salary Survey over a
13-month period: September 2015 to October 2016.
The median salary of European respondents was 48K,
but the spread was huge. For example, the top third earned In the horizontal bar charts throughout this report, we include
almost four times on average as the bottom third. Such a the interquartile range (IQR) to show the middle 50% of
respondents answers to questions such as salary. One quarter
large variance is not surprising due to the differences in the
of the respondents have a salary below the displayed range,
per capita income of countries represented.
and one quarter have a salary above the displayed range.
A note on currency: we requested responses about salaries The IQRs are represented by colored, horizontal bars. On each
and other monetary amounts in US dollars. In this report, we of these colored bars, the white vertical band represents the
have converted all amounts into euros, though many European median value.
2
BASE SALARY (EURO)
SHARE OF RESPONDENTS
0K
20K
40K
60K
(EUROS)
80K
100K
Base Salary
120K
140K
160K
180K
> 180K
Share of Respondents
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Countries
THE UK WAS THE MOST WELL-REPRESENTED EUROPE- 54K, Spanish and Italian respondents tended to have much
AN COUNTRY, with about a quarter of the sample, followed lower salaries (35K). Portugal was somewhat of an outlier in
by Germany, Spain, and the Netherlands. By far, the highest Western Europe, with a median of 22K. The median salaries
salaries were in Switzerland, with of Germany, the Netherlands, and
a median salary of 117K, followed France were close to the regional
by Norway with 96K, although Unlike in the west, Eastern median (about 53K).
the latter figure is only based on European salaries appeared Salaries drop dramatically as we
five respondents. Among countries
move south and east. The median
represented by more than just a to be fairly consistent, even
salary of respondents from Central
handful of respondents, the UK had across national borders. and Eastern Europe was 17K. Russia
the second-highest median salary:
and Poland, the two most well-rep-
63k (53).
resented countries in this half of the
Even within Western Europe, there was significant variation continent, also had median salaries of 17K: unlike in the west,
in salary. While UK, Swiss, and Scandinavian salaries were Eastern European salaries appeared to be fairly consistent,
significantly higher than the Western European median of even across national borders.
4
COUNTRIES
SHARE OF RESPONDENTS
United Kingdom
Germany
Spain
Netherlands
France
Country
Ireland
Russia
Switzerland
Poland
Italy
Share of Respondents
5
COUNTRIES
SALARY MEDIAN AND IQR* (EURO)
United Kingdom
Germany
Spain
Netherlands
Country
France
Ireland
Russia
Switzerland
Poland
Italy
Range/Median (Euro)
*The interquartile range (IQR) is the middle 50% of respondents' salaries. One quarter of respondents have a salary below this range, one quarter have a salary above this range.
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
NATIONAL MEDIAN SALARIES SHOULD BE EXPECTED One shortcoming of this plot is that it does not take into ac-
TO VARY according to the economic count years of experience, which turns
conditions of the country, so the out to be very uneven in the sample
question becomes: given a countrysThe question becomes, among different countries. In particu-
economy (in particular, its per capita given a countrys lar, respondents from Western Europe
GDP), do the salaries of data scientists tended to be much more experienced
economy (in particular,
and engineers vary? Here, we plot per (with an average of seven years) than
capita GDP and median salary of each respondents from Eastern Europe
its per capita GDP),
country in the sample. The resulting (with an average of four years).
do the salaries of
graph is remarkably linear, with outliers Since experience correlates with salary,
largely explained by small sample size: the West-East salary difference is
data scientists and
Greece, for example, has a high- exaggerated due to this experience
er-than-expected median salary given a engineers vary? differential.
relatively low per capita GDP, but this is
based on just one respondent.
8
SALARY VERSUS GDP
The size of each circle represents the number of respondents from the country in the sample.
80K
Switzerland
Norway
Per capita GDP (thousands of Euros)
60K
Ireland Denmark
Sweden
Austria
40K Netherlands United Kingdom
Belgium Finland
Germany
France
Italy
Spain
20K
Slovenia Portugal
Czech Republic Estonia Greece
Hungary Slovakia
Poland
Croatia
Turkey
Romania Russia
Belarus
Serbia Armenia
0K
0K 20K 40K 60K 80K 100K 120K 140K
Median salary of data scientists / engineers (thousands of Euros)
9
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Company Size
COMPARED TO THE WORLDWIDE SAMPLE, THE
SUBSAMPLE FROM EUROPE TENDED TO COME FROM
SMALLER COMPANIES. While 45% of US respondents were
from companies with over 2,500 employees, only 35% of
European respondents were from such companies. This number
rises to 39% if we consider only those from Western Europe;
only 13% of respondents from Central/Eastern Europe were
from large companies.
Largely because of the East-West split, salaries at larger com-
panies tend to be high: the 19% of respondents from compa-
nies with over 10,000 employees had a median salary of 61K.
In contrast, the half of the sample that was from companies
with 2 to 500 employees had a median salary of 43K.
10
COMPANY SIZE
SHARE OF RESPONDENTS
19%
10,000+
16%
8% 2,501 10,000
1,001 2,500
5%
501 1,000
101 500 1
2 25
Number of Employees
26 100
101 500
17% 501 1,000
26 100
1,001 2,500
2,501 10,000
11% 10,000 +
0K 20K 40K 60K 80K 100K 120K
2 25
Range/Median
1%
1
11
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Industry
A PLURALITY OF RESPONDENTS (20%) WORKED IN
CONSULTING, after which the top industries were software
(18%), banking/finance (10%), and retail/ecommerce (9%).
These figures are very similar to those of the worldwide
sample.
As with company size, the differences in salaries among in-
dustries was largely attributable to geography. Manufacturing,
insurance, and publishing/media were all overrepresented by
countries with higher salaries. One exception to this was bank-
ing/finance, which had a high median salary of 58K and did
not correlate with a particular country or region: data profes-
sionals in banking do appear to earn more.
12
INDUSTRY
SHARE OF RESPONDENTS 6%
EDUCATION
6% 5%
CARRIERS /
HEALTHCARE / TELECOMMUNICATIONS
MEDICAL
6% 5%
ADVERTISING /
MARKETING / PR MANUFACTURING /
HEAVY INDUSTRY
9%
RETAIL /
ECOMMERCE 5%
PUBLISHING /
MEDIA
10%
BANKING / FINANCE
3%
OTHER
18% 3%
SOFTWARE ENTERTAINMENT
2%
INSURANCE
21%
CONSULTING
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Tools
THE TOP FOUR TOOLS FROM EUROPEAN RESPONDENTS those who used more than 10 tools had a median salary
WERE EXCEL, SQL, R, AND PYTHON, each used by over of 53K.
half of all respondents. These four tools have kept their top Since there is significant overlap between users of individu-
positions in every Data Salary Survey we have conducted, and al tools, it is useful to consider mutually exclusive groups of
there does not appear to be any sign of this changing. Almost respondents based on tool usage. The groups we will define
every respondent reported using at least one, and about half here are based on a simple set of rules, but using a clustering
the sample used three or all four. algorithm would produce very
Commonly used tools with similar results. The rules are:
above-average salaries include Commonly used tools with
1) If someone used Spark or
Scikit-learn (whose users have above-average salaries include Hadoop, we call them Hadoop
a median salary of 52K), Spark
Scikit-learn (whose users have 2) If someone (not in the Hadoop
(55K), Hive (57K), and Scala
a median salary of (52K), group) uses R and/or Python,
(70K). Readers may notice that
they are labeled R+Python,
most tools have a higher median Spark (55K), Hive (57K), and R-only, or Python-only,, as
salary than
the sample-wide median salary Scala (70K). appropriate
14
TOOLS
SHARE OF RESPONDENTS
Excel
SQL
R
Python
ggplot
MySQL
Scikit-learn
Bash
Matplotlib
Spark
Microsoft SQL Server
PostgreSQL
Oracle
Tableau
Hive
D3
Java
JavaScript
Shiny
Spark MlLib
Tool
Apache Hadoop
Cloudera
ElasticSearch
Scala
MongoDB
Visual Basic/VBA
QlikView
Matlab
Hortonworks
SQLite
Google Charts
Impala
Kafka
Hbase
C
C++
Power BI
Weka
0% 10% 20% 30% 40% 50% 60% 70%
Share of Respondents
TOOLS
SALARY MEDIAN AND IQR*
Excel
SQL
R
Python
ggplot
MySQL
Scikit-learn
Bash
Matplotlib
Spark
Microsoft SQL Server
PostgreSQL
Oracle
Tableau
Hive
D3
Java
JavaScript
Shiny
Spark MlLib
Tool
Apache Hadoop
Cloudera
ElasticSearch
Scala
MongoDB
Visual Basic/VBA
QlikView
Matlab
Hortonworks
SQLite
Google Charts
Impala
Kafka
Hbase
C
C++
Power BI
Weka
0K 20K 40K 60K 80K 100K
Range/Median
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
17
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Tasks
WE ALSO ASKED FOR INFORMATION ABOUT WORK Tasks that correlate most strongly with high salaries are
TASKS: this is meant to dig a little deeper than what we those that involve management and business decisions, such
can glean from a job title. Respondents could say they had as communicating findings to business decision-makers,
major or minor involvement in each task. For the most identifying business problems to be solved with analytics,
part, tasks that correlate positively with salary also correlate organizing and guiding team projects, and communicat-
positively with years of experi- ing with people outside of your
ence (and often are clearly asso- company. The median salaries
ciated with being a manager). Tasks that correlate most of respondents who reported
major involvement in these tasks
Among the most common strongly with high salaries are were 54K, 56K, 66K, and
tasks were basic exploratory
data analysis, data cleaning, those that involve management 55K, respectively.
creating visualizations, and and business decisions. Aside from management and
conducting data analysis to business strategy, several
answer research questions, each technical tasks stood out for
with 85%93% of the sample above-average salaries:
as a major or minor task. Data cleaning has the unfavorable developing prototype models (major involvement: 52K),
distinction of being the only task for which each level of setting up/maintaining data platforms (50K), and
involvement means less pay: those with major involvement developing products that depend on real-time analytics
earn less than those with minor involvement, who in turn (62K). For each of these tasks, respondents who reported
earn less than those who never clean data. However, this may major involvement earned more than those who reported
have more to do with the fact that more-experienced data minor involvement, and those who reported minor
professionals (who we know earn more) tend to do less data involvement earned more than those who did not
cleaning. engage in these tasks at all.
18
RESPONDENT CATEGORIES WHICH OF THE FOLLOWING MOST ACCURATELY
BASED ON TOOL USAGE DESCRIBES THE NEXT STEP YOU WOULD LIKE TO
TAKE TO ADVANCE YOUR CAREER?
SHARE OF RESPONDENTS
SHARE OF RESPONDENTS
26% 41%
HADOOP / SPARK
LEARN NEW TECHNOLOGY/SKILLS
22%
PYTHON+R 20%
18% WORK ON MORE INTERESTING/
IMPORTANT PROJECTS
R ONLY
13% 18%
PYTHON ONLY MOVE INTO LEADERSHIP ROLES
19% 12%
SQL/EXCEL SWITCH COMPANIES
(NO PY/R)
6%
START YOUR
OWN COMPANY
SALARY MEDIAN AND IQR (EUROS) SALARY MEDIAN AND IQR (EUROS)
Hadoop / Spark
Learn new technology/skills
Python+R Work on more interesting/
Tool Usage
important projects
Next Step
R only
Python only Move into leadership roles
Range/Median
19
TASKS
RESPONDENTS COUNTED IF THEY SAID THEY HAVE "MAJOR INVOLVEMENT" IN THIS TASK
Task
ETL
Organizing and guiding team projects
Developing dashboards
Communicating with people outside your company
Planning large software projects or data systems
Teaching/training others
Developing data analytics software
Setting up/maintaining data platforms
Developing products that depend on real-time data analytics
Using dashboards and spreadsheets (made by others) to make decisions
Number of Respondents
TASKS
SALARY MEDIAN AND IQR*
Task
ETL
Organizing and guiding team projects
Developing dashboards
Communicating with people outside your company
Planning large software projects or data systems
Teaching/training others
Developing data analytics software
Setting up/maintaining data platforms
Developing products that depend on real-time data analytics
Using dashboards and spreadsheets (made by others) to make decisions
Range/Median (Euro)
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
22
TIME SPENT CODING TIME SPENT IN MEETINGS
9% 2%
NONE NONE
10% 29%
1 TO 3 HOURS / WEEK 1 TO 3 HOURS / WEEK
23%
4 TO 8 HOURS / WEEK 43%
4 TO 8 HOURS / WEEK
36%
9 TO 20 HOURS / WEEK
23%
9 TO 20 HOURS / WEEK
23%
OVER 20 3%
HOURS / WEEK OVER 20 HOURS / WEEK
SALARY MEDIAN AND IQR (EUROS) SALARY MEDIAN AND IQR (EUROS)
None None
Hours in Meetings
Hours Coding
23
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Salary Change
AN ALTERNATIVE METRIC TO CURRENT SALARY is the A final question asked respondents about the next step they
amount that ones salary changed in the last three years. Most would like to take in their career. The top response was learn
respondents salaries grew at least a little in the last three years, new technology/skills and respondents who gave this answer
and about a third of the sample saw tended to be less experienced (5.5
their wages rise by 50% or more over years on average) and have smaller
this period. This latter group tended to Most respondents salaries salaries (40K median) than the rest
be less experienced, with an average of of the sample.
4.4 years of experience (compared to
grew at least a little in the
Respondents who said they would
7.6 years among those whose salaries last three years like to move into leadership roles
did not grow by 50% or more). had salaries far above average
For Spark/Hadoop and Python-only (65K median). The other top
users, we use the tool-defined groups from page 8. They responses were work on more interesting/important
were most likely to have had 50% or more wage growth projects, switch companies, and start your own
(40% and 44% of them did, respectively). Respondents who company. Respondents who work in the healthcare
did not use Hadoop, Python, or R (the SQL/Excel group) industry were far more likely to choose switch companies
were the least likely: only 19% of them reported a 50% rise (33%) than respondents from other industries (11%).
in their salaries.
24
PERCENTAGE CHANGE IN SALARY 6%
OVER LAST THREE YEARS +20% TO +30%
SHARE OF RESPONDENTS
7%
+30% TO +40%
5%
11% +40% TO +50%
+10% TO +20%
6%
+0% TO +10%
11%
+50%
TO +75%
17%
NO CHANGE
5%
+75%
TO +100%
(DOUBLE)
7%
NEGATIVE CHANGE
10%
+100% TO +200% (TRIPLE)
6%
7% OVER TRIPLE
N//A (SALARY WAS ZERO)
25
2017 EUROPEAN DATA SCIENCE SALARY SURVEY
Conclusion
THE PURPOSE OF OUR SALARY SURVEYS and the software costs, but labor expenses as well. We hope that
reports based on them is to provide an annual, data-driven the information in this report will aid the task of building
snapshot of how much professionals in your field make, estimates for such decisions.
and to expose details of their If you made use of this report,
work and career. There are please consider taking the online
plenty of resources out there Business leaders choosing survey. Every year, we work to
that can give an idea of how technologies need to consider build on the last years report,
much a data scientist can and much of the improvement
expect to earn or which not just the software costs, but comes from increased sample
software tools are on the rise,
labor expenses as well. sizes. This is a joint research
but there arent many places effort, and the more interaction
where these data points are we have with you, the deeper
integrated into one report. we will be able to explore the data science space in Europe.
This information isnt just for employees, either. Business Thank you!
leaders choosing technologies need to consider not just the
26
We need your data.
To stay up to date on this research, your participation is
critical. The survey is now open for the 2017 report, and if
you can spare just 10 minutes of your time, we encourage
you to take the survey.
oreilly.com/ideas/take-the-2017-data-science-salary-survey
27
How do data science salaries for people in Europe compare to their counterparts in
the rest of the world? Among the more than 1000 people who responded to OReillys
2016 Data Science Salary Survey, 359 live and work in various European countries as
data scientists, analysts, engineers, and related professions.
This report takes a deep dive into the survey results from respondents in various
regions of Europe, including the tools they use, the compensation they receive, and
the roles they play in their respective organizations. Even if you didnt take part in the
survey, you can still plug your own information into the surveys simple linear model to
see where you fit.
With this report, youll learn:
n How salaries vary by country and specific regions in Europe
n The average size of the companies respondents work for, according to region
n How a respondents salary is affected by their countrys gross domestic product
n The type of industry they work for, including software, banking and finance, and
retail and ecommerce
n Which tools are most commonly used vs the tools used by respondents with
above-average salaries
n The major and minor tasks that respondents perform
John King is a data scientist at OReilly Media. Having previously worked on survey-based
sociolinguistic research in the Republic of Georgia, he now runs surveys at OReilly, using
the results not just for internal use but also to share his findings with the public.