Apache Spark Machine Learning Blueprints

Ebook543 pages11 hours

Apache Spark Machine Learning Blueprints

Name: Apache Spark Machine Learning Blueprints
Author: Alex Liu
ISBN: 9781785887789

By Alex Liu

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Customize Apache Spark and R to fit your analytical needs in customer research, fraud detection, risk analytics, and recommendation engine development
Develop a set of practical Machine Learning applications that can be implemented in real-life projects
A comprehensive, project-based guide to improve and refine your predictive models for practical implementation

Who This Book Is For

If you are a data scientist, a data analyst, or an R and SPSS user with a good understanding of machine learning concepts, algorithms, and techniques, then this is the book for you. Some basic understanding of Spark and its core elements and application is required.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateMay 30, 2016

ISBN9781785887789

Author

Alex Liu

Related authors

Skip carousel

Related to Apache Spark Machine Learning Blueprints

Related ebooks

Skip carousel

HDInsight Essentials - Second Edition
Ebook
HDInsight Essentials - Second Edition
byRajesh Nadipalli
Rating: 0 out of 5 stars
0 ratings
Fast Data Processing with Spark 2 - Third Edition
Ebook
Fast Data Processing with Spark 2 - Third Edition
byKrishna Sankar
Rating: 0 out of 5 stars
0 ratings
Apache Spark Graph Processing
Ebook
Apache Spark Graph Processing
byRamamonjison Rindra
Rating: 0 out of 5 stars
0 ratings
Machine Learning with Spark - Second Edition
Ebook
Machine Learning with Spark - Second Edition
byNick Pentreath
Rating: 0 out of 5 stars
0 ratings
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
Ebook
Cloud Native AI and Machine Learning on AWS: Use SageMaker for building ML models, automate MLOps, and take advantage of numerous AWS AI services (English Edition)
byPremkumar Rangarajan
Rating: 0 out of 5 stars
0 ratings
Scala for Data Science
Ebook
Scala for Data Science
byBugnion Pascal
Rating: 0 out of 5 stars
0 ratings
Learning Apache Mahout
Ebook
Learning Apache Mahout
byTiwary Chandramani
Rating: 0 out of 5 stars
0 ratings
Data Analysis with Python and PySpark
Ebook
Data Analysis with Python and PySpark
byJonathan Rioux
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
Ebook
Practical API Architecture and Development with Azure and AWS: Design and Implementation of APIs for the Cloud
byThurupathan Vijayakumar
Rating: 0 out of 5 stars
0 ratings
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
Ebook
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
byBahaaldine Azarmi
Rating: 0 out of 5 stars
0 ratings
Learning Apache Mahout Classification
Ebook
Learning Apache Mahout Classification
byGupta Ashish
Rating: 0 out of 5 stars
0 ratings
Scaling Big Data with Hadoop and Solr - Second Edition
Ebook
Scaling Big Data with Hadoop and Solr - Second Edition
byHrishikesh Vijay Karambelkar
Rating: 0 out of 5 stars
0 ratings
Learning Hadoop 2
Ebook
Learning Hadoop 2
byGarry Turkington
Rating: 4 out of 5 stars
4/5
Python for SAS Users: A SAS-Oriented Introduction to Python
Ebook
Python for SAS Users: A SAS-Oriented Introduction to Python
byRandy Betancourt
Rating: 0 out of 5 stars
0 ratings
Spark for Data Science
Ebook
Spark for Data Science
bySrinivas Duvvuri
Rating: 0 out of 5 stars
0 ratings
Learning Data Mining with Python
Ebook
Learning Data Mining with Python
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Data Analytics & Visualization All-in-One For Dummies
Ebook
Data Analytics & Visualization All-in-One For Dummies
byJack A. Hyman
Rating: 0 out of 5 stars
0 ratings
Mastering Machine Learning with R - Second Edition
Ebook
Mastering Machine Learning with R - Second Edition
byLesmeister Cory
Rating: 0 out of 5 stars
0 ratings
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Ebook
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
byDebananda Ghosh
Rating: 0 out of 5 stars
0 ratings
Mastering Spark for Data Science
Ebook
Mastering Spark for Data Science
byAndrew Morgan
Rating: 0 out of 5 stars
0 ratings
Neo4j High Performance
Ebook
Neo4j High Performance
bySonal Raj
Rating: 0 out of 5 stars
0 ratings
Learning Apache Spark 2
Ebook
Learning Apache Spark 2
byMuhammad Asif Abbasi
Rating: 0 out of 5 stars
0 ratings
Machine Learning Systems: Designs that scale
Ebook
Machine Learning Systems: Designs that scale
byJeffrey Smith
Rating: 0 out of 5 stars
0 ratings
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Frank Kane's Taming Big Data with Apache Spark and Python
Ebook
Frank Kane's Taming Big Data with Apache Spark and Python
byFrank Kane
Rating: 0 out of 5 stars
0 ratings
Cassandra High Availability
Ebook
Cassandra High Availability
byRobbie Strickland
Rating: 5 out of 5 stars
5/5
Big Data Analytics with R
Ebook
Big Data Analytics with R
bySimon Walkowiak
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Python High Performance - Second Edition
Ebook
Python High Performance - Second Edition
byGabriele Lanaro
Rating: 0 out of 5 stars
0 ratings

Data Modeling & Design For You

Skip carousel

A Concise Guide to Object Orientated Programming
Ebook
A Concise Guide to Object Orientated Programming
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
Bayesian Analysis with Python
Ebook
Bayesian Analysis with Python
byOsvaldo Martin
Rating: 5 out of 5 stars
5/5
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
Think Like a Data Scientist: Tackle the data science process step-by-step
Ebook
Think Like a Data Scientist: Tackle the data science process step-by-step
byBrian Godsey
Rating: 0 out of 5 stars
0 ratings
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
Ebook
WordPress For Beginners - How To Set Up A Self Hosted WordPress Blog
byCyrus Jackson
Rating: 0 out of 5 stars
0 ratings
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
Ebook
Supercharge Power BI: Power BI is Better When You Learn To Write DAX
byMatt Allington
Rating: 5 out of 5 stars
5/5
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
Ebook
Hacks To Crush Plc Program Fast & Efficiently Everytime... : Coding, Simulating & Testing Programmable Logic Controller With Examples
byMichael Blake
Rating: 5 out of 5 stars
5/5
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
Ebook
Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016
byRob Collie
Rating: 4 out of 5 stars
4/5
No-Code Data Science: Mastering Advanced Analytics, Machine Learning, and Artificial Intelligence
Ebook
No-Code Data Science: Mastering Advanced Analytics, Machine Learning, and Artificial Intelligence
byDavid Patrishkoff
Rating: 0 out of 5 stars
0 ratings
Principles of Data Science
Ebook
Principles of Data Science
bySinan Ozdemir
Rating: 4 out of 5 stars
4/5
Data Visualization: a successful design process
Ebook
Data Visualization: a successful design process
byAndy Kirk
Rating: 4 out of 5 stars
4/5
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
Ebook
Advanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch
byIvan Vasilev
Rating: 0 out of 5 stars
0 ratings
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Thinking in Algorithms: Strategic Thinking Skills, #2
Ebook
Thinking in Algorithms: Strategic Thinking Skills, #2
byAlbert Rutherford
Rating: 5 out of 5 stars
5/5
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
Ebook
Learn T-SQL Querying: A guide to developing efficient and elegant T-SQL code
byPedro Lopes
Rating: 0 out of 5 stars
0 ratings
Data Analytics with Python: Data Analytics in Python Using Pandas
Ebook
Data Analytics with Python: Data Analytics in Python Using Pandas
byFrank Millstein
Rating: 3 out of 5 stars
3/5
The Secrets of ChatGPT Prompt Engineering for Non-Developers
Ebook
The Secrets of ChatGPT Prompt Engineering for Non-Developers
byCea West
Rating: 5 out of 5 stars
5/5
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Ebook
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
byPeter Bradley
Rating: 0 out of 5 stars
0 ratings
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Ebook
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
byJason Scotts
Rating: 3 out of 5 stars
3/5
Logic Design: A Review Of Theory And Practice
Ebook
Logic Design: A Review Of Theory And Practice
byGlen G. Jr. Langdon
Rating: 0 out of 5 stars
0 ratings
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Mastering Python Design Patterns
Ebook
Mastering Python Design Patterns
bySakis Kasampalis
Rating: 0 out of 5 stars
0 ratings
Graph Databases in Action: Examples in Gremlin
Ebook
Graph Databases in Action: Examples in Gremlin
byJosh Perryman
Rating: 0 out of 5 stars
0 ratings
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
Ebook
Microsoft 365 Excel: The Only App That Matters: Calculations, Analytics, Modeling, Data Analysis and Dashboard Reporting for the New Era of Dynamic Data Driven Decision Making & Insight
byMike Girvin
Rating: 3 out of 5 stars
3/5
Neural Networks: Neural Networks Tools and Techniques for Beginners
Ebook
Neural Networks: Neural Networks Tools and Techniques for Beginners
byJohn Slavio
Rating: 5 out of 5 stars
5/5
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
Ebook
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
byalasdair gilchrist
Rating: 0 out of 5 stars
0 ratings
20 Most Powerful Conditional Formatting Techniques
Ebook
20 Most Powerful Conditional Formatting Techniques
byAndrei Besedin
Rating: 0 out of 5 stars
0 ratings
Minding the Machines: Building and Leading Data Science and Analytics Teams
Ebook
Minding the Machines: Building and Leading Data Science and Analytics Teams
byJeremy Adamson
Rating: 0 out of 5 stars
0 ratings
Mastering Agile User Stories
Ebook
Mastering Agile User Stories
byDeEtta Balthazar
Rating: 4 out of 5 stars
4/5
The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction
Ebook
The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction
byAndy Mitchell
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
Podcast episode
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
byData Engineering Podcast
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
#460: [INTRODUCING] Amazon HealthLake: Now available, Amazon HealthLake is a HIPAA-eligible service offering healthcare and life sciences c
Podcast episode
#460: [INTRODUCING] Amazon HealthLake: Now available, Amazon HealthLake is a HIPAA-eligible service offering healthcare and life sciences c
byAWS Podcast
0 ratings
0% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
Podcast episode
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
byData Engineering Podcast
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Production data labeling workflows: with Mark Christensen, CEO of Xelex.ai
Podcast episode
Production data labeling workflows: with Mark Christensen, CEO of Xelex.ai
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
Podcast episode
Can we predict the accuracy of a Neural Network? Yes, with the WeightWatcher tool by Charles Martin, Ph.D. - 002: In this episode we do a deep dive into deep neural networks. What conclusions can we make looking at the distribution of eigenvalues of each layer?
byMachine Learning Cafe
0 ratings
0% found this document useful
588: An Algorithm for Success! Using Computational and Imaging Approaches to Study Cognitive Science - Dr. Aleix Martinez: Dr. Aleix Martinez is a Professor in the Department of Electrical and Computer Engineering and Director of the Computational Biology and Cognitive Science Laboratory at the Ohio State University. He is also affiliated with the Department of Biomedical...
Podcast episode
588: An Algorithm for Success! Using Computational and Imaging Approaches to Study Cognitive Science - Dr. Aleix Martinez: Dr. Aleix Martinez is a Professor in the Department of Electrical and Computer Engineering and Director of the Computational Biology and Cognitive Science Laboratory at the Ohio State University. He is also affiliated with the Department of Biomedical...
byPeople Behind the Science Podcast Stories from Scientists about Science, Life, Research, and Science Careers
0 ratings
0% found this document useful
Episode 469 - Microsoft Fabric: Azure and Data technical specialist Ian Pike, joins us on the Podcast to give us a primer on Fabric and what is means for customers that use various data-related services on Azure.   Media file: https://azpodcast.blob.core.windows.net/episodes/Episode469.mp3 YouTube: https://youtu.be/k0DkkVkfw2o Resources:   Announcement for Fabric including the key dates and how to enable : Introducing Microsoft Fabric and Copilot in Microsoft Power BI | Microsoft Power BI Blog | Microsoft Power BI Fabric Getting started : Get started with Microsoft Fabric - Training | Microsoft Learn Happy Paths : Introducing the end-to-end scenarios in Microsoft Fabric | Microsoft Fabric Blog | Microsoft Fabric Cube in a Cube (2 Microsoft Employees with some great content, who are part of the Fabric Customer Advisory Team) : (488) Guy in a Cube - YouTube   Endjin : (488) endjin - YouTube Advancing Analytics :  (488) Advancing Fabric - YouTube   Other
Podcast episode
Episode 469 - Microsoft Fabric: Azure and Data technical specialist Ian Pike, joins us on the Podcast to give us a primer on Fabric and what is means for customers that use various data-related services on Azure.   Media file: https://azpodcast.blob.core.windows.net/episodes/Episode469.mp3 YouTube: https://youtu.be/k0DkkVkfw2o Resources:   Announcement for Fabric including the key dates and how to enable : Introducing Microsoft Fabric and Copilot in Microsoft Power BI | Microsoft Power BI Blog | Microsoft Power BI Fabric Getting started : Get started with Microsoft Fabric - Training | Microsoft Learn Happy Paths : Introducing the end-to-end scenarios in Microsoft Fabric | Microsoft Fabric Blog | Microsoft Fabric Cube in a Cube (2 Microsoft Employees with some great content, who are part of the Fabric Customer Advisory Team) : (488) Guy in a Cube - YouTube   Endjin : (488) endjin - YouTube Advancing Analytics :  (488) Advancing Fabric - YouTube   Other
byThe Azure Podcast
0 ratings
0% found this document useful
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
Podcast episode
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Managing non-REST APIs like GraphQL and gRPC with Nandan Sridhar and David Feuer: Alexandrina Garcia-Verdin and Stephanie Wong host this week's episode all about managing non-REST APIs.
Podcast episode
Managing non-REST APIs like GraphQL and gRPC with Nandan Sridhar and David Feuer: Alexandrina Garcia-Verdin and Stephanie Wong host this week's episode all about managing non-REST APIs.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
DataOps 101 - Lars Albertsson
Podcast episode
DataOps 101 - Lars Albertsson
byDataTalks.Club
0 ratings
0% found this document useful
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
Podcast episode
Spanner Myths Busted with Pritam Shah and Vaibhav Govil: This week, we’re busting myths around Google Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. and host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database...
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
Podcast episode
Build A Data Lake For Your Security Logs With Scanner: Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. In this episode he shares the story of how it got started, how it works, and how you can get started with it.
byData Engineering Podcast
0 ratings
0% found this document useful
When Machine Learning meets privacy - Episode 5
Podcast episode
When Machine Learning meets privacy - Episode 5
byMLOps.community
0 ratings
0% found this document useful
WBSP266: Grow Your Business by Understanding SAP’s Capabilities, an Objective Panel Discussion and Weekly Digital Transformation News Stories
Podcast episode
WBSP266: Grow Your Business by Understanding SAP’s Capabilities, an Objective Panel Discussion and Weekly Digital Transformation News Stories
byWBSRocks: Business Growth with ERP and Digital Transformation
0 ratings
0% found this document useful
Similarities and Differences between ML and Analytics - Rishabh Bhargava
Podcast episode
Similarities and Differences between ML and Analytics - Rishabh Bhargava
byDataTalks.Club
0 ratings
0% found this document useful
MLX: Opinionated ML Pipelines in MLflow // Xiangrui Meng // Coffee Sessions #112
Podcast episode
MLX: Opinionated ML Pipelines in MLflow // Xiangrui Meng // Coffee Sessions #112
byMLOps.community
0 ratings
0% found this document useful
From MLOps to DataOps - Santona Tuli
Podcast episode
From MLOps to DataOps - Santona Tuli
byDataTalks.Club
0 ratings
0% found this document useful
Machine in Production = Data Engineering + ML + Software Engineering // Satish Chandra Gupta // MLOps Coffee Sessions #16
Podcast episode
Machine in Production = Data Engineering + ML + Software Engineering // Satish Chandra Gupta // MLOps Coffee Sessions #16
byMLOps.community
0 ratings
0% found this document useful
LLMs in Focus: From One-Size Fits All to Verticalized Solutions // Venky Ganti & Laurel Orr // #196
Podcast episode
LLMs in Focus: From One-Size Fits All to Verticalized Solutions // Venky Ganti & Laurel Orr // #196
byMLOps.community
0 ratings
0% found this document useful
Designing Data Platforms For Fintech Companies: Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.
Podcast episode
Designing Data Platforms For Fintech Companies: Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector.
byData Engineering Podcast
0 ratings
0% found this document useful

Skip carousel

Data Backups: Critical Part of Cyber Strategy Strategies to Protect Your Data
Techfastly
Article
Data Backups: Critical Part of Cyber Strategy Strategies to Protect Your Data
Jun 1, 2022
6 min read
What Systems And Software Are Used By The Falcon 9?
Techfastly
Article
What Systems And Software Are Used By The Falcon 9?
Oct 21, 2020
Last summer, SpaceX embarked on the first US-manned spaceflight in almost a decade. What made this event even more historic is that it successfully took NASA astronauts into orbit on a privately-manned spacecraft and delivered them to the Internation
3 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Taming Complexity With Intelligence: A Movement To Help Businesses Along The SAP S/4HANA Journey
The European Business Review
Article
Taming Complexity With Intelligence: A Movement To Help Businesses Along The SAP S/4HANA Journey
Jan 31, 2020
6 min read
Cloudy With No Chance Of Erp
Architectural Review Asia Pacific
Article
Cloudy With No Chance Of Erp
Nov 11, 2019
ERP (enterprise resource planning) was born around the time the first ‘[Something] for Dummies’ book was published*. It’s typically inflexible, uncompromising software designed for large businesses, like banks, large corporations, manufacturing and s
2 min read
Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Inside APC
APC
Article
Inside APC
Feb 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Jul 17, 2023
2 min read
Inside APC
APC
Article
Inside APC
Feb 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Remote Audio Data Is Here
NPR
Article
Remote Audio Data Is Here
Dec 11, 2018
3 min read
Inside APC
APC
Article
Inside APC
Apr 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
May 22, 2023
2 min read
Inside APC
APC
Article
Inside APC
Jun 19, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Mar 20, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Oct 31, 2022
2 min read
Inside APC
APC
Article
Inside APC
Nov 28, 2022
APC is Australia’s oldest consumer technology magazine – having been consistently in print for forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on the
2 min read
Inside APC
APC
Article
Inside APC
Dec 29, 2022
APC is Australia’s oldest consumer technology magazine – having been consistently in print for forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on the
2 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Inside APC
APC
Article
Inside APC
Jan 23, 2023
APC is Australia’s oldest consumer technology magazine – having been consistently in print for over forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on
2 min read
Inside APC
APC
Article
Inside APC
Aug 14, 2023
2 min read
Inside APC
APC
Article
Inside APC
Oct 9, 2023
2 min read
Inside APC
APC
Article
Inside APC
Sep 11, 2023
2 min read
Accurate, Open Source IP-based Localisation
Linux Format
Article
Accurate, Open Source IP-based Localisation
Dec 14, 2021
8 min read
PyScript – Bring Python Coding To The Web
APC
Article
PyScript – Bring Python Coding To The Web
Aug 8, 2022
4 min read
Inside APC
APC
Article
Inside APC
Oct 3, 2022
2 min read
Inside APC
APC
Article
Inside APC
Sep 5, 2022
APC is Australia’s oldest consumer technology magazine – having been consistently in print for forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on the
2 min read
Inside APC
APC
Article
Inside APC
Aug 8, 2022
APC is Australia’s oldest consumer technology magazine – having been consistently in print for forty years, since our first issue way back in May 1980 – and we take that heritage and responsibility very seriously. While our focus is obviously on the
2 min read
Inside APC
APC
Article
Inside APC
Oct 4, 2021
2 min read

Related categories

Skip carousel

Reviews for Apache Spark Machine Learning Blueprints

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Apache Spark Machine Learning Blueprints - Alex Liu

Apache Spark Machine Learning Blueprints

Credits

About the Author

About the Reviewer

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the color images of this book

Errata

Piracy

Questions

1. Spark for Machine Learning

Spark overview and Spark advantages

Spark overview

Spark advantages

Spark computing for machine learning

Machine learning algorithms

MLlib

Other ML libraries

Spark RDD and dataframes

Spark RDD

Spark dataframes

Dataframes API for R

ML frameworks, RM4Es and Spark computing

ML frameworks

RM4Es

The Spark computing framework

ML workflows and Spark pipelines

ML as a step-by-step workflow

ML workflow examples

Spark notebooks

Notebook approach for ML

Step 1: Getting the software ready

Step 2: Installing the Knitr package

Step 3: Creating a simple report

Spark notebooks

Summary

2. Data Preparation for Spark ML

Accessing and loading datasets

Accessing publicly available datasets

Loading datasets into Spark

Exploring and visualizing datasets

Data cleaning

Dealing with data incompleteness

Data cleaning in Spark

Data cleaning made easy

Identity matching

Identity issues

Identity matching on Spark

Entity resolution

Short string comparison

Long string comparison

Record deduplication

Identity matching made better

Crowdsourced deduplication

Configuring the crowd

Using the crowd

Dataset reorganizing

Dataset reorganizing tasks

Dataset reorganizing with Spark SQL

Dataset reorganizing with R on Spark

Dataset joining

Dataset joining and its tool – the Spark SQL

Dataset joining in Spark

Dataset joining with the R data table package

Feature extraction

Feature development challenges

Feature development with Spark MLlib

Feature development with R

Repeatability and automation

Dataset preprocessing workflows

Spark pipelines for dataset preprocessing

Dataset preprocessing automation

Summary

3. A Holistic View on Spark

Spark for a holistic view

The use case

Fast and easy computing

Methods for a holistic view

Regression modeling

The SEM approach

Decision trees

Feature preparation

PCA

Grouping by category to use subject knowledge

Feature selection

Model estimation

MLlib implementation

The R notebooks' implementation

Model evaluation

Quick evaluations

RMSE

ROC curves

Results explanation

Impact assessments

Deployment

Dashboard

Rules

Summary

4. Fraud Detection on Spark

Spark for fraud detection

The use case

Distributed computing

Methods for fraud detection

Random forest

Decision trees

Feature preparation

Feature extraction from LogFile

Data merging

Model estimation

MLlib implementation

R notebooks implementation

Model evaluation

A quick evaluation

Confusion matrix and false positive ratios

Results explanation

Big influencers and their impacts

Deploying fraud detection

Rules

Scoring

Summary

5. Risk Scoring on Spark

Spark for risk scoring

The use case

Apache Spark notebooks

Methods of risk scoring

Logistic regression

Preparing coding in R

Random forest and decision trees

Preparing coding

Data and feature preparation

OpenRefine

Model estimation

The DataScientistWorkbench for R notebooks

R notebooks implementation

Model evaluation

Confusion matrix

ROC

Kolmogorov-Smirnov

Results explanation

Big influencers and their impacts

Deployment

Scoring

Summary

6. Churn Prediction on Spark

Spark for churn prediction

The use case

Spark computing

Methods for churn prediction

Regression models

Decision trees and Random forest

Feature preparation

Feature extraction

Feature selection

Model estimation

Spark implementation with MLlib

Model evaluation

Results explanation

Calculating the impact of interventions

Deployment

Scoring

Intervention recommendations

Summary

7. Recommendations on Spark

Apache Spark for a recommendation engine

The use case

SPSS on Spark

Methods for recommendation

Collaborative filtering

Preparing coding

Data treatment with SPSS

Missing data nodes on SPSS modeler

Model estimation

SPSS on Spark – the SPSS Analytics server

Model evaluation

Recommendation deployment

Summary

8. Learning Analytics on Spark

Spark for attrition prediction

The use case

Spark computing

Methods of attrition prediction

Regression models

About regression

Preparing for coding

Decision trees

Preparing for coding

Feature preparation

Feature development

Feature selection

Principal components analysis

Subject knowledge aid

ML feature selection

Model estimation

Spark implementation with the Zeppelin notebook

Model evaluation

A quick evaluation

The confusion matrix and error ratios

Results explanation

Calculating the impact of interventions

Calculating the impact of main causes

Deployment

Rules

Scoring

Summary

9. City Analytics on Spark

Spark for service forecasting

The use case

Spark computing

Methods of service forecasting

Regression models

About regression

Preparing for coding

Time series modeling

About time series

Preparing for coding

Data and feature preparation

Data merging

Feature selection

Model estimation

Spark implementation with the Zeppelin notebook

Spark implementation with the R notebook

Model evaluation

RMSE calculation with MLlib

RMSE calculation with R

Explanations of the results

Biggest influencers

Visualizing trends

The rules of sending out alerts

Scores to rank city zones

Summary

10. Learning Telco Data on Spark

Spark for using Telco Data

The use case

Spark computing

Methods for learning from Telco Data

Descriptive statistics and visualization

Linear and logistic regression models

Decision tree and random forest

Data and feature development

Data reorganizing

Feature development and selection

Model estimation

SPSS on Spark – SPSS Analytics Server

Model evaluation

RMSE calculations with MLlib

RMSE calculations with R

Confusion matrix and error ratios with MLlib and R

Results explanation

Descriptive statistics and visualizations

Biggest influencers

Special insights

Visualizing trends

Model deployment

Rules to send out alerts

Scores subscribers for churn and for Call Center calls

Scores subscribers for purchase propensity

Summary

11. Modeling Open Data on Spark

Spark for learning from open data

The use case

Spark computing

Methods for scoring and ranking

Cluster analysis

Principal component analysis

Regression models

Score resembling

Data and feature preparation

Data cleaning

Data merging

Feature development

Feature selection

Model estimation

SPSS on Spark – SPSS Analytics Server

Model evaluation

RMSE calculations with MLlib

RMSE calculations with R

Results explanation

Comparing ranks

Biggest influencers

Deployment

Rules for sending out alerts

Scores for ranking school districts

Summary

Index

Apache Spark Machine Learning Blueprints

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: May 2016

Production reference: 1250516

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-039-1

www.packtpub.com

Credits

Author

Alex Liu

Reviewer

Hao Ren

Commissioning Editor

Dipika Gaonkar

Acquisition Editor

Meeta Rajani

Content Development Editor

Anish Sukumaran

Technical Editors

Dhiraj Chandanshive

Siddhesh Patil

Copy Editor

Shruti Iyer

Project Coordinator

Izzat Contractor

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Graphics

Disha Haria

Production Coordinator

Nilesh R. Mohite

Cover Work

Nilesh R. Mohite

About the Author

Alex Liu is an expert in research methods and data science. He is currently one of IBM's leading experts in big data analytics and also a lead data scientist, where he serves big corporations, develops big data analytics IPs, and speaks at industrial conferences such as STRATA, Insights, SMAC, and BigDataCamp. In the past, Alex served as chief or lead data scientist for a few companies, including Yapstone, RS, and TRG. Before this, he was a lead consultant and director at RMA, where he provided data analytics consultation and training to many well-known organizations, including the United Nations, Indymac, AOL, Ingram Micro, GEM, Farmers Insurance, Scripps Networks, Sears, and USAID. At the same time, Dr. Liu taught advanced research methods to PhD candidates at University of Southern California and University of California at Irvine. Before this, he worked as a managing director for CATE/GEC and as a research fellow for the Asia/Pacific Research Center at Stanford University. Alex has a Ph.D. in quantitative sociology and a master's degree of science in statistical computing from Stanford University.

I would like to thank IBM for providing a great open and innovative environment to learn and practice Big Data analytics. I would especially like to thank my managers, Kim Siegel and Kevin Zachary, for their support and encouragement, without which it would not have been possible to complete this book.

I also would like to thank my beautiful wife, Lauria, and two beautiful daughters, Kate and Khloe, for their patience and support, which enabled me to work effectively. Finally, I would like to thank the Packt staff, especially Anish Sukumaran and Meeta Rajani, for making the writing and editing process smooth and joyful.

About the Reviewer

Hao Ren is data engineer working in Paris for a classified advertising website named leboncoin (https://www.leboncoin.fr/), which is the fifth most visited site in France. Three years' work experience of functional programming in Scala, Machine Learning, and distributed systems defines his career. Hao's main speciality is based on machine learning with Apache Spark, such as building a crawler detection system, recommander system, and so on. He has also reviewed a more detailed and advanced book by Packt Publishing, Machine Learning with Spark, which is worth a read as well.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Preface

As data scientists and machine learning professionals, our jobs are to build models for detecting frauds, predicting customer churns, or turning data into insights in a broad sense; for this, we sometimes need to process huge amounts of data and handle complicated computations. Therefore, we are always excited to see new computing tools, such as Spark, and spend a lot of time learning about them. To learn about these new tools, a lot of learning materials are available, but they are from a more computing perspective, and often written by computer scientists.

We, the data scientists and machine learning professionals, as users of Spark, are more concerned about how the new systems can help us build models with more predictive accuracy and how these systems can make data processing and coding easy for us. This is the main reason why this book has been developed and why this book has been written by a data scientist.

At the same time, we, as data scientists and machine learning professionals, have already developed our frameworks and processes as well as used some good model building tools, such as R and SPSS. We understand that some of the new tools, such as MLlib of Spark, may replace certain old tools, but not all of them. Therefore, using Spark together with our existing tools is essential to us as users of Spark and becomes one of the main focuses for this book, which is also one of the critical elements, making this book different from other Spark books.

Overall, this is a Spark book written by a data scientist for data scientists and machine learning professionals to make machine learning easy for us with Spark.

What this book covers

Chapter 1, Spark for Machine Learning, introduces Apache Spark from a machine learning perspective. We will discuss Spark dataframes and R, Spark pipelines, RM4Es data science framework, as well as the Spark notebook and implementation models.

Chapter 2, Data Preparation for Spark ML, focuses on data preparation for machine learning on Apache Spark with tools such as Spark SQL. We will discuss data cleaning, identity matching, data merging, and feature development.

Chapter 3, A Holistic View on Spark, clearly explains the RM4E machine learning framework and processes with a real-life example and also demonstrates the benefits of obtaining holistic views for businesses easily with Spark.

Chapter 4, Fraud Detection on Spark, discusses how Spark makes machine learning for fraud detection easy and fast. At the same time, we will illustrate a step-by-step process of obtaining fraud insights from big data.

Chapter 5, Risk Scoring on Spark, reviews machine learning methods and processes for a risk scoring project and implements them using R notebooks on Apache Spark in a special DataScientistWorkbench environment. Our focus for this chapter is the notebook.

Chapter 6, Churn Prediction on Spark, further illustrates our special step-by-step machine learning process on Spark with a focus on using MLlib to develop customer churn predictions to improve customer retention.

Chapter 7, Recommendations on Spark, describes how to develop recommendations with big data on Spark by utilizing SPSS on the Spark system.

Chapter 8, Learning Analytics on Spark, extends our application to serve learning organizations like universities and training institutions, for which we will apply machine learning to improve learning analytics for a real case of predicting student attrition.

Chapter 9, City Analytics on Spark, helps the readers to gain a better understanding about how Apache Spark could be utilized not only for commercial use, but also for public use as to serve cities with a real use case of predicting service requests on Spark.

Chapter 10, Learning Telco Data on Spark, further extends what was studied in the previous chapters and allows readers to combine what was learned for a dynamic machine learning with a huge amount of Telco Data on Spark.

Chapter 11, Modeling Open Data on Spark, presents dynamic machine learning with open data on Spark from which users can take a data-driven approach and utilize all the technologies available for optimal results. This chapter is an extension of Chapter 9, City Analytics on Spark, and Chapter 10, Learning Telco Data on Spark, as well as a good review of all the previous chapters with a real-life project.

What you need for this book

Throughout this book, we assume that you have some basic experience of programming, either in Scala or Python; some basic experience with modeling tools, such as R or SPSS; and some basic knowledge of machine learning and data science.

Who this book is for

This book is written for analysts, data scientists, researchers, and machine learning professionals who need to process Big Data but who are not necessarily familiar with Spark.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: In R, the forecast package has an accuracy function that can be used to calculate forecasting accuracy.

A block of code is set as follows:

df1 = sqlContext.read \

. format(json) \ data format is json

. option(samplingRatio, 0.01) \ set sampling ratio as 1%

. load(/home/alex/data1,json) \ specify data name and location

Any command-line input or output is written as follows:

sqlContext <- sparkRSQL.init(sc)

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: Users can click on Create new note, which is the first line under Notebook on the first left-hand side column.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/ApacheSparkMachineLearningBlueprints_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on

Enjoying the preview?

Page 1 of 1

Apache Spark Machine Learning Blueprints

About this ebook

Alex Liu

Related authors

Related to Apache Spark Machine Learning Blueprints

Related ebooks

Data Modeling & Design For You

Related podcast episodes

Related articles

Related categories

Reviews for Apache Spark Machine Learning Blueprints

What did you think?

Book preview

Apache Spark Machine Learning Blueprints - Alex Liu

Table of Contents

Apache Spark Machine Learning Blueprints

Apache Spark Machine Learning Blueprints

Credits

About the Author

About the Reviewer

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the color images of this book

Errata

Piracy