You are on page 1of 36

FIVE THINGS I LEARNED WHILE

PROTOTYPING ML PAPERS
ELLEN KÖNIG / @ELLEN_KOENIG

SENIOR DATA SCIENTIST


NATIVE INSTRUMENTS
A LONG, LONG IN AN OFFICE
TIME AGO… FAR AWAY…
HOW MANY OF YOU CAN
RELATE TO OUR PROBLEM?
😨
BUT WORK IS ALL ABOUT
GROWTH, RIGHT??
FORTUNATELY
OUR USE
CASE
WHEN SHOULD YOU LOOK FOR
RESEARCH PAPERS?

• „Somebody must
have solved this
before!“

• Noready-to-use
implementation
A WORKFLOW FOR
PROTOTYPING ML PAPERS

1. Search for 2. Decide on


research findings comparison criteria

3. Evaluate your
papers

4. Prioritize 5. Prototype
approaches approaches
STEP 1: SEARCH FOR RESEARCH
FINDINGS

Needed:
An overview of the field
COMPILING AN OVERVIEW OF
THE FIELD

Start with survey


papers,
follow references

Compile

Problems and Foundational and


approaches cutting edge papers
STEP 2: DECIDE ON YOUR
COMPARISON CRITERIA
WHICH PAPERS ARE RIGHT FOR
YOU?

Summarize
common metrics
and baselines

Pick simple
Minimally required
metrics and
metric targets?
baselines

Refresher on baselines: https://www.quora.com/What-does-


baseline-mean-in-machine-learning
STEP 3: EVALUATE YOUR
PAPERS

Groundbreaking? Copycat? Garbage?

Journal / conference
Team experience?
quality?
STEP 3: EVALUATE YOUR
PAPERS — A CHECKLIST

3. Results

2. Methodology

1. Abstract & Introduction


ABSTRACT & INTRODUCTION

Main question: Relevant to your problem?


Similar context? Addresses your problem?

Approach: Groundbreaking or Results: Better than


3. Results

2. Methodology

improvement? targets & baseline?


✔Abstract & Introduction
METHODOLOGY SECTION

Main question: Approach reproducible?

Data set size and content similar?

✓22k black-and-white pages


✓German corpus
3. Results

? Research documents rather than ✔ Methodology

banking documents
✔ Abstract & Introduction
METHODOLOGY SECTION

Entire process described?

✓Seems to be complete
Pre-processing steps described completely?

✓Image conversion and scaling is described


? OCR tool / approach is not mentioned
Well-known methods? Or completely described
methods? 3. Results

✓Neural network with descriptions of the ✔ Methodology

configuration ✔ Abstract & Introduction


RESULTS SECTION

Main question: Results reliable?

Relevant metrics for your use case?

✓Accuracy
Metrics appropriate for the problem?

✓Common metric for classification ✔ Results

Metrics appropriate for the dataset? ✔ Methodology

X Not suitable for imbalanced classes ✔ Abstract & Introduction


RESULTS SECTION

Better than your baseline?

✓Yes, by 0.23 over the baseline


Better than the metrics target?

? They are close


Any published review of the results?

? Not yet
Improvement analyzed with suitable statistical tests? ✔ Results

✔ Methodology
X No statistical analysis, and reported
measurements are not comparable ✔ Abstract & Introduction
RESULTS SECTION

• For a refreshers on model evaluation see http://


www.oreilly.com/data/free/files/evaluating-
machine-learning-models.pdf

• For a summary of statistical tests, see: http://


www.pnrjournal.com/viewimage.asp?
img=JPharmNegativeResults_2010_1_2_61_7570
8_f1.jpg ✔ Results

✔ Methodology

✔ Abstract & Introduction


STEP 4: PRIORITIZE YOUR
CHOSEN APPROACHES
PRIORIZATION MATRIX

High
Impact
Quick Major
Wins Projects

High
Effort

Fill-in Thankless
Jobs Tasks
STEP 5: PROTOTYPE YOUR
CHOSEN APPROACHES
A FEW RECOMMENDATIONS

Understand all
Compile a glossary equations & code

Reference
Higher level sections of
language papers
A FEW MORE
RECOMMENDATIONS

Compile the
Verify under same conditions performance in a table
MORE RECOMMENDATIONS

http://codecapsule.com/2012/01/18/how-to-
implement-a-paper/
SUMMARY: WHEN SHOULD YOU
LOOK FOR RESEARCH PAPERS?

• „Somebody must
have solved this
before!“

• Noready-to-use
implementation
SUMMARY: A WORKFLOW FOR
PROTOTYPING ML PAPERS

1. Search for research findings

2. Decide on your comparison criteria

3. Evaluate quality, relevance and reproducibility

4. Prioritize your chosen approaches

5. Prototype the best approaches


HAVE (MORE 🙃) FUN PROTOTYPING!

Slides will be tweeted from @ellen_koenig


IMAGE CREDITS

• Title slide: https://www.flickr.com/photos/


vblibrary/6671465981

• Slide 2: Google calendar & maps

• Slide 4: https://www.datasciencecentral.com/
profiles/blogs/140-machine-learning-formulas

• Slide 6: https://pixabay.com/de/bremer-
stadtmusikanten-skulptur-2444326/
IMAGE CREDITS

• Slide 8 & 28: pixabay.com

• Slide 9: thenounproject.com

• Search icon by Luis Prado

• Scales icon by Veronica Karenina

• Evaluation icon by Dinosoft Labs

• Priorities icon by Arthur Shlain

• Prototype icon by asianson design


IMAGE CREDITS

• Slide 10

• https://pixabay.com/en/book-address-book-learning-
learn-1171564/

• https://en.wikipedia.org/wiki/Map#/media/
File:World_Map_1689.JPG

• Slide 11: thenounproject.com

• Network icon by Gregor Cresner

• Problem solving icon and razor blade icon by Vector Market

• Bank icon by Stock image photo


IMAGE CREDITS

• Slide 12: https://pxhere.com/en/photo/536212

• Slide 13

• Bar chart icon: pixabay.com

• Touch icon by Jasfart for thenounproject.com

• Target icon by Libby Ventura for


thenounproject.com
IMAGE CREDITS

• Slide 14: thenounproject.com

• Ground breaking icon by faisalovers

• Trash icon by UNICORN

• Newspaper icon by Aman

• Slide 14: Cat icon: pixabay.com

• Slide 22: https://pxhere.com/en/photo/109282

• Slide 23: Adapted from: http://www.sixsigmadaily.com/impact-effort-matrix/

• Slide 24: https://pixnio.com/objects/computer/programming-code-


programmer-coding-coffee-cup-computer-copy-hands-computer-keyboard
IMAGE CREDITS

1. Slide 25: thenounproject.com

• Table icon by Yu luck

• Pi icon by Sumana Chamrumsorakist

• Anaconda icon by parkjisun

• Documents icon by Creative Stall

2. Slide 26: thenounproject.com

• Maginfiying glass icon by afredocreates.com/icons and flaticons.com

• Table icon by Douglas Santos

3. Slide 29: https://commons.wikimedia.org/wiki/


File:Pocketwatch_cutaway_drawing.jpg

You might also like