Ebook299 pages1 hour

Apache Oozie Essentials

Name: Apache Oozie Essentials
Author: Singh Jagat Jasjit
ISBN: 9781785888465

By Singh Jagat Jasjit

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Unleash the power of Apache Oozie to create and manage your big data and machine learning pipelines in one go

About This Book

- Teaches you everything you need to know to get started with Apache Oozie from scratch and manage your data pipelines effortlessly
- Learn to write data ingestion workflows with the help of real-life examples from the author’s own personal experience
- Embed Spark jobs to run your machine learning models on top of Hadoop

Who This Book Is For

If you are an expert Hadoop user who wants to use Apache Oozie to handle workflows efficiently, this book is for you. This book will be handy to anyone who is familiar with the basics of Hadoop and wants to automate data and machine learning pipelines.

What You Will Learn

- Install and configure Oozie from source code on your Hadoop cluster
- Dive into the world of Oozie with Java MapReduce jobs
- Schedule Hive ETL and data ingestion jobs
- Import data from a database through Sqoop jobs in HDFS
- Create and process data pipelines with Pig, hive scripts as per business requirements.
- Run machine learning Spark jobs on Hadoop
- Create quick Oozie jobs using Hue
- Make the most of Oozie’s security capabilities by configuring Oozie’s security

In Detail

As more and more organizations are discovering the use of big data analytics, interest in platforms that provide storage, computation, and analytic capabilities is booming exponentially. This calls for data management. Hadoop caters to this need. Oozie fulfils this necessity for a scheduler for a Hadoop job by acting as a cron to better analyze data.
Apache Oozie Essentials starts off with the basics right from installing and configuring Oozie from source code on your Hadoop cluster to managing your complex clusters. You will learn how to create data ingestion and machine learning workflows.
This book is sprinkled with the examples and exercises to help you take your big data learning to the next level. You will discover how to write workflows to run your MapReduce, Pig ,Hive, and Sqoop scripts and schedule them to run at a specific time or for a specific business requirement using a coordinator. This book has engaging real-life exercises and examples to get you in the thick of things. Lastly, you’ll get a grip of how to embed Spark jobs, which can be used to run your machine learning models on Hadoop.
By the end of the book, you will have a good knowledge of Apache Oozie. You will be capable of using Oozie to handle large Hadoop workflows and even improve the availability of your Hadoop environment.

Style and approach

This book is a hands-on guide that explains Oozie using real-world examples. Each chapter is blended beautifully with fundamental concepts sprinkled in-between case study solution algorithms and topped off with self-learning exercises.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateDec 11, 2015

ISBN9781785888465

Author

Singh Jagat Jasjit

Related authors

Skip carousel

Related to Apache Oozie Essentials

Related ebooks

Skip carousel

Hadoop Cluster Deployment
Ebook
Hadoop Cluster Deployment
byDanil Zburivsky
Rating: 0 out of 5 stars
0 ratings
Monitoring Hadoop
Ebook
Monitoring Hadoop
byGurmukh Singh
Rating: 0 out of 5 stars
0 ratings
Apache Hive Essentials
Ebook
Apache Hive Essentials
byDayong Du
Rating: 0 out of 5 stars
0 ratings
Securing Hadoop
Ebook
Securing Hadoop
bySudheesh Narayanan
Rating: 4 out of 5 stars
4/5
Python Data Persistence
Ebook
Python Data Persistence
byMalhar Lathkar
Rating: 0 out of 5 stars
0 ratings
Optimizing Hadoop for MapReduce
Ebook
Optimizing Hadoop for MapReduce
byKhaled Tannir
Rating: 0 out of 5 stars
0 ratings
Apache Mahout Clustering Designs
Ebook
Apache Mahout Clustering Designs
byGupta Ashish
Rating: 0 out of 5 stars
0 ratings
Learning Apache Mahout Classification
Ebook
Learning Apache Mahout Classification
byGupta Ashish
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
HBase High Performance Cookbook
Ebook
HBase High Performance Cookbook
byRuchir Choudhry
Rating: 0 out of 5 stars
0 ratings
Administrating Solr
Ebook
Administrating Solr
bySurendra Mohan
Rating: 0 out of 5 stars
0 ratings
Practical OneOps
Ebook
Practical OneOps
byNilesh Nimkar
Rating: 0 out of 5 stars
0 ratings
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
Ebook
Data Processing and Modeling with Hadoop: Mastering Hadoop Ecosystem Including ETL, Data Vault, DMBok, GDPR, and Various Data-Centric Tools
byVinicius Aquino do Vale
Rating: 0 out of 5 stars
0 ratings
Apache Spark 2.x Cookbook
Ebook
Apache Spark 2.x Cookbook
byRishi Yadav
Rating: 0 out of 5 stars
0 ratings
Building Python Real-Time Applications with Storm
Ebook
Building Python Real-Time Applications with Storm
byBhatnagar Kartik
Rating: 0 out of 5 stars
0 ratings
Learning SaltStack
Ebook
Learning SaltStack
byColton Myers
Rating: 4 out of 5 stars
4/5
Cloud Development and Deployment with CloudBees
Ebook
Cloud Development and Deployment with CloudBees
byNicolas De loof
Rating: 0 out of 5 stars
0 ratings
Getting Started with Hazelcast - Second Edition
Ebook
Getting Started with Hazelcast - Second Edition
byMat Johns
Rating: 0 out of 5 stars
0 ratings
Instant MapReduce Patterns – Hadoop Essentials How-to
Ebook
Instant MapReduce Patterns – Hadoop Essentials How-to
bySrinath Perera
Rating: 0 out of 5 stars
0 ratings
PostgreSQL Development Essentials
Ebook
PostgreSQL Development Essentials
byManpreet Kaur
Rating: 5 out of 5 stars
5/5
Learning Azure DocumentDB
Ebook
Learning Azure DocumentDB
byBecker Riccardo
Rating: 0 out of 5 stars
0 ratings
Monitoring Elasticsearch
Ebook
Monitoring Elasticsearch
byDan Noble
Rating: 0 out of 5 stars
0 ratings
OpenStack Sahara Essentials
Ebook
OpenStack Sahara Essentials
byOmar Khedher
Rating: 0 out of 5 stars
0 ratings
Monitoring Docker
Ebook
Monitoring Docker
byMcKendrick Russ
Rating: 0 out of 5 stars
0 ratings
Hadoop Essentials
Ebook
Hadoop Essentials
byShiva Achari
Rating: 5 out of 5 stars
5/5
Scaling Big Data with Hadoop and Solr - Second Edition
Ebook
Scaling Big Data with Hadoop and Solr - Second Edition
byHrishikesh Vijay Karambelkar
Rating: 0 out of 5 stars
0 ratings
HBase Essentials
Ebook
HBase Essentials
byNishant Garg
Rating: 0 out of 5 stars
0 ratings
Neo4j High Performance
Ebook
Neo4j High Performance
bySonal Raj
Rating: 0 out of 5 stars
0 ratings
Elasticsearch Blueprints
Ebook
Elasticsearch Blueprints
byVineeth Mohan
Rating: 0 out of 5 stars
0 ratings
CentOS High Performance
Ebook
CentOS High Performance
byCánepa Gabriel
Rating: 0 out of 5 stars
0 ratings

Computers For You

Skip carousel

Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
Deep Search: How to Explore the Internet More Effectively
Ebook
Deep Search: How to Explore the Internet More Effectively
byAlan Pearce
Rating: 5 out of 5 stars
5/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
Ebook
How to Create Cpn Numbers the Right way: A Step by Step Guide to Creating cpn Numbers Legally
byAlex Parkinson
Rating: 4 out of 5 stars
4/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Practical Lock Picking: A Physical Penetration Tester's Training Guide
Ebook
Practical Lock Picking: A Physical Penetration Tester's Training Guide
byDeviant Ollam
Rating: 5 out of 5 stars
5/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
Ebook
AI Crash Course: A fun and hands-on introduction to machine learning, reinforcement learning, deep learning, and artificial intelligence with Python
byHadelin de Ponteves
Rating: 0 out of 5 stars
0 ratings
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Computer Science: A Concise Introduction
Ebook
Computer Science: A Concise Introduction
byIan Sinclair
Rating: 4 out of 5 stars
4/5
The Insider's Guide to Technical Writing
Ebook
The Insider's Guide to Technical Writing
byKrista Van Laan
Rating: 0 out of 5 stars
0 ratings
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
Ebook
The Mega Box: The Ultimate Guide to the Best Free Resources on the Internet
byChris Mason
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Python Machine Learning By Example
Ebook
Python Machine Learning By Example
byYuxi (Hayden) Liu
Rating: 4 out of 5 stars
4/5
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
Ebook
YouTube: How to Build and Optimize Your First YouTube Channel, Marketing, SEO, Tips and Strategies for YouTube Channel Success
byTommy Swindali
Rating: 4 out of 5 stars
4/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
Ebook
Summary of Dotcom Secrets: by Russell Brunson - The Underground Playbook for Growing Your Company Online with Sales Funnels - A Comprehensive Summary
byAlexander Cooper
Rating: 5 out of 5 stars
5/5
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Ebook
The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
byRalph Kimball
Rating: 0 out of 5 stars
0 ratings
User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play
Ebook
User Friendly: How the Hidden Rules of Design Are Changing the Way We Live, Work, and Play
byCliff Kuang
Rating: 4 out of 5 stars
4/5
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
Podcast episode
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
byTest and Code
0 ratings
0% found this document useful
CockroachDB In Depth with Peter Mattis - Episode 35
Podcast episode
CockroachDB In Depth with Peter Mattis - Episode 35
byData Engineering Podcast
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57: Scalable and Stateful Streaming Data With Apache Flink (Interview)
Podcast episode
Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57: Scalable and Stateful Streaming Data With Apache Flink (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
A New Distributed Cloud Architecture
Podcast episode
A New Distributed Cloud Architecture
byThe Cloudcast
0 ratings
0% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
153: Playwright for Python: end to end testing of web apps - Ryan Howard: Playwright is an end to end automated testing framework for web apps with Python support and even a pytest plugin.
Podcast episode
153: Playwright for Python: end to end testing of web apps - Ryan Howard: Playwright is an end to end automated testing framework for web apps with Python support and even a pytest plugin.
byTest and Code
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
79: BI Brainz Masterclass with Raquel Seville and Anna Ria: Welcome to another exciting masterclass, taking you behind the scenes of how two female leaders on our BI Brainz team created not one, but two of our most downloaded database templates ever! Sharing these juicy details, will be Raquel Seville, CEO of...
Podcast episode
79: BI Brainz Masterclass with Raquel Seville and Anna Ria: Welcome to another exciting masterclass, taking you behind the scenes of how two female leaders on our BI Brainz team created not one, but two of our most downloaded database templates ever! Sharing these juicy details, will be Raquel Seville, CEO of...
byAnalytics on Fire
0 ratings
0% found this document useful
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
Podcast episode
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
byData Engineering Podcast
100%
100% found this document useful
Flightcontrol and Going Beyond Heroku with Brandon Bayer: A platform as a service, or PaaS, is the concept of a complete development and deployment environment in the cloud. One of the best examples is Heroku, which was created in 2007 and later acquired by Salesforce.
Podcast episode
Flightcontrol and Going Beyond Heroku with Brandon Bayer: A platform as a service, or PaaS, is the concept of a complete development and deployment environment in the cloud. One of the best examples is Heroku, which was created in 2007 and later acquired by Salesforce.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Observability with Eduardo Silva: There are hundreds of observability companies out there, and many ways to think about observability, such as application performance monitoring, server monitoring, and tracing. In a production application, multiple tools are often needed to get proper ...
Podcast episode
Observability with Eduardo Silva: There are hundreds of observability companies out there, and many ways to think about observability, such as application performance monitoring, server monitoring, and tracing. In a production application, multiple tools are often needed to get proper ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Potluck — Corn Shucking × Self-Hosting Images × WordPress × Getting Scammed × Portfolios: It’s another Potluck! In this episode, Scott and Wes answer your questions about corn shucking, self-hosting images, WordPress, getting scammed, portfolios, more! Linode - Sponsor Whether you’re working on a personal project or managing...
Podcast episode
Potluck — Corn Shucking × Self-Hosting Images × WordPress × Getting Scammed × Portfolios: It’s another Potluck! In this episode, Scott and Wes answer your questions about corn shucking, self-hosting images, WordPress, getting scammed, portfolios, more! Linode - Sponsor Whether you’re working on a personal project or managing...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
Podcast episode
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Opening AI's Black Box with Prof. David Bau, Koyena Pal, and Eric Todd of Northeastern University: In this episode, we dive deep into the inner workings of large language models with Professor David Bau and grad students Koyena Pal and Eric Todd from Northeastern University.
Podcast episode
Opening AI's Black Box with Prof. David Bau, Koyena Pal, and Eric Todd of Northeastern University: In this episode, we dive deep into the inner workings of large language models with Professor David Bau and grad students Koyena Pal and Eric Todd from Northeastern University.
by"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
0 ratings
0% found this document useful
Breaking the Tech Mold with Stephanie Wong: Stephanie Wong, Head of Developer Engagement at Google, joins Corey this week to chat about her wide ranging work. Of which a significant and perhaps more interesting amount occurs outside the walls of Google. While Stephanie is certainly a key player at
Podcast episode
Breaking the Tech Mold with Stephanie Wong: Stephanie Wong, Head of Developer Engagement at Google, joins Corey this week to chat about her wide ranging work. Of which a significant and perhaps more interesting amount occurs outside the walls of Google. While Stephanie is certainly a key player at
byScreaming in the Cloud
0 ratings
0% found this document useful
Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop: An interview with Davit Buniatyan about his work on Activeloop and the open source Hub framework for reducing the toil involved in getting your unstructured data ready for computer vision and machine learning projects.
Podcast episode
Prepare Your Unstructured Data For Machine Learning And Computer Vision Without The Toil Using Activeloop: An interview with Davit Buniatyan about his work on Activeloop and the open source Hub framework for reducing the toil involved in getting your unstructured data ready for computer vision and machine learning projects.
byData Engineering Podcast
0 ratings
0% found this document useful
Developing Storage Solutions Before the Rest with AB Periasamay: Conversations about what the cloud is might be an infinitely convoluted one, but some are taking the conversation down paths less traveled. That is certainly the case for AB Periasamy, CEO and Co-Founder of MinIO, an open source provider of high performan
Podcast episode
Developing Storage Solutions Before the Rest with AB Periasamay: Conversations about what the cloud is might be an infinitely convoluted one, but some are taking the conversation down paths less traveled. That is certainly the case for AB Periasamy, CEO and Co-Founder of MinIO, an open source provider of high performan
byScreaming in the Cloud
0 ratings
0% found this document useful
How Redpanda Extracts Business Value from Data Events with Alex Gallego
Podcast episode
How Redpanda Extracts Business Value from Data Events with Alex Gallego
byScreaming in the Cloud
0 ratings
0% found this document useful
How Hera is an Enabler of MLOps Integrations // Flaviu Vadan // Coffee Sessions #115
Podcast episode
How Hera is an Enabler of MLOps Integrations // Flaviu Vadan // Coffee Sessions #115
byMLOps.community
0 ratings
0% found this document useful
Episode 60. All your Containers Are Belong to Us (An intro to Docker): So you have heard about it, and probably ran into it already. Docker is a super cool tech that let us create / manage and deploy applications (It is really what would come out if Devs and Ops decided to have a kid). Come hear how you can too master...
Podcast episode
Episode 60. All your Containers Are Belong to Us (An intro to Docker): So you have heard about it, and probably ran into it already. Docker is a super cool tech that let us create / manage and deploy applications (It is really what would come out if Devs and Ops decided to have a kid). Come hear how you can too master...
byJava Pub House
0 ratings
0% found this document useful
68: Ember 2 & The Ember Community: Summary Ember community leaders Audrey Listochkin (@listochkin) & Robert Jackson (@rwjblue) talk with us about the long awaited Ember 2 release and the Ember community across the globe. The future of Ember is larger than this 2.x release and...
Podcast episode
68: Ember 2 & The Ember Community: Summary Ember community leaders Audrey Listochkin (@listochkin) & Robert Jackson (@rwjblue) talk with us about the long awaited Ember 2 release and the Ember community across the globe. The future of Ember is larger than this 2.x release and...
byThe Web Platform Podcast
0 ratings
0% found this document useful
The Art and Science of Database Innovation with Andi Gutmans: Andi Gutmans, General Manager and Vice President, Engineering at Google, joins Corey on Screaming in the Cloud to discuss all things database innovation at Google Cloud. Andi explains why significant surges of customers are switching from legacy proprieta
Podcast episode
The Art and Science of Database Innovation with Andi Gutmans: Andi Gutmans, General Manager and Vice President, Engineering at Google, joins Corey on Screaming in the Cloud to discuss all things database innovation at Google Cloud. Andi explains why significant surges of customers are switching from legacy proprieta
byScreaming in the Cloud
0 ratings
0% found this document useful
MLOps Meetup #24 // How to Become a Better Data Scientist: The Definite Guide // Alexey Grigorev
Podcast episode
MLOps Meetup #24 // How to Become a Better Data Scientist: The Definite Guide // Alexey Grigorev
byMLOps.community
0 ratings
0% found this document useful
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
Podcast episode
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
Podcast episode
Oracle Data Lakehouse: With each passing day, more and more data sources are sending greater volumes of data across the globe. For any organization, this combination of structured and unstructured data continues to be a challenge. Data lakehouses link, correlate, and...
byOracle University Podcast
0 ratings
0% found this document useful

Skip carousel

Join the Pod, Man!
Linux Format
Article
Join the Pod, Man!
May 30, 2023
8 min read
Docker vs Podman
APC
Article
Docker vs Podman
Apr 19, 2021
When Cockpit was first developed, it had plug-in support for administering your Docker containers remotely via its user-friendly web interface. But then Red Hat OS became a major backer of Cockpit, and when Red Hat developed its own alternative to Do
1 min read
Grafana, Telegraf And Influxdb
Linux Format
Article
Grafana, Telegraf And Influxdb
Jun 30, 2020
If you don’t like Netdata or if you want to try something else, you can give Grafana (https://grafana.com), Telegraf (www.influxdata.com/time-series-platform/telegraf) and InfluxDB (www.influxdata.com/products/influxdb-overview) a try. Grafana can’t
1 min read
Build a Better nginx Reverse Proxy
Maximum PC
Article
Build a Better nginx Reverse Proxy
Feb 4, 2020
4 min read
CSV Handling
Linux Format
Article
CSV Handling
Mar 10, 2020
3 min read
Grafana Terminology
Linux Format
Article
Grafana Terminology
Jan 14, 2020
A Grafana data source is a database, file or service that provides data to Grafana – it cannot operate without data. A Grafana panel is the basic building block of Grafana. Panels are made of visualisations or queries. A Grafana query is used for req
1 min read
Automatically Provision Devices With Ansible
Linux Format
Article
Automatically Provision Devices With Ansible
Nov 15, 2022
Matt Holder has worked in IT support for over a decade, and always tries to utilise Linux alongside other installed systems. C loud computing is a term that means a number of things. Software as a Service (SaaS) is one such example of what can be hos
9 min read
Your First Steps In Grafana
Linux Format
Article
Your First Steps In Grafana
Nov 17, 2020
The easiest way to get hold of Grafana and begin using it as soon as possible is by downloading and executing its official Docker image. This means that apart from the Docker image, you won’t need to download, set up or install anything else for Graf
1 min read
Monitor Systems And Docker Deployments
Linux Format
Article
Monitor Systems And Docker Deployments
Jun 30, 2020
Welcome to Netdata, software for distributed real-time performance and health monitoring of UNIX machines. Don’t you dare turn that page! A key advantage of Netdata is that it collects all of its metrics without introducing too much load on to the Li
8 min read
About Kibana
Linux Format
Article
About Kibana
Mar 10, 2020
Kibana offers analytics and a search dashboard for Elasticsearch, as well as visualisation capabilities for data stored in Elasticsearch. Kibana is so handy that it would be a shame to use Elasticsearch without combining it with Kibana. Generally spe
1 min read
Visualising In Grafana
Linux Format
Article
Visualising In Grafana
May 4, 2021
1 min read
Filesystems
Linux Format
Article
Filesystems
Nov 16, 2021
1 min read
Elasticsearch And Kibana Basics
Linux Format
Article
Elasticsearch And Kibana Basics
Dec 15, 2020
1 min read
Updating Sites
Linux Format
Article
Updating Sites
Oct 19, 2021
1 min read
Can I Use Python 2 In Maya 2022?
3D World
Article
Can I Use Python 2 In Maya 2022?
Aug 10, 2021
1 min read
Build A Linux Smart Home Office
APC
Article
Build A Linux Smart Home Office
Feb 22, 2021
18 min read
Artificial Intelligence Rules Of The Road
Linux Format
Article
Artificial Intelligence Rules Of The Road
Nov 14, 2023
AI FOR ALL! Anyone who works with computers needs to understand that AI will undoubtedly change how work is executed. That said, I don’t think we are anywhere near the much bleated “Everyone will lose their jobs!” IT-related jobs will change but they
2 min read
Set Up A Production- Ready Web Server
APC
Article
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
The Smart Home Dream
MacLife
Article
The Smart Home Dream
Feb 2, 2021
4 min read
Get To Grips With Kali
Linux Format
Article
Get To Grips With Kali
May 2, 2023
5 min read
Other Cool Stuff You Can Do
MacLife
Article
Other Cool Stuff You Can Do
Mar 26, 2024
3 min read
HotPicks
Linux Format
Article
HotPicks
Jun 27, 2023
12 min read
Other Cool Stuff You Can Do
MacFormat
Article
Other Cool Stuff You Can Do
Apr 2, 2024
3 min read
Personal Cloud Servers
Linux Format
Article
Personal Cloud Servers
Sep 19, 2023
1 min read
HotPicks
Linux Format
Article
HotPicks
Nov 15, 2022
12 min read
The Smart Home Dream
MacFormat
Article
The Smart Home Dream
Dec 15, 2020
7 min read
Newsdesk
Linux Format
Article
Newsdesk
Dec 13, 2022
OPEN SOURCE FUNDING GitHub, Fastly and Mozilla are all looking for new projects to back, giving a boost to open source development. Small open source projects might be created solely by enthusiasts but most make use of outside developers, often paid
9 min read
Contributing For Non - Coders
Linux Format
Article
Contributing For Non - Coders
Jan 10, 2023
9 min read
Set Up A Production-ready Web Server
Linux Format
Article
Set Up A Production-ready Web Server
Sep 24, 2019
8 min read

Related categories

Skip carousel

Reviews for Apache Oozie Essentials

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Apache Oozie Essentials - Singh Jagat Jasjit

Apache Oozie Essentials

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Setting up Oozie

Configuring Oozie in Hortonworks distribution

Installing Oozie using tar ball

Creating a test virtual machine

Building Oozie source code

Summary of the build script

Codehaus Maven move

Download dependency jars

Preparing to create a WAR file

Create a WAR file

Configure Oozie MySQL database

Configure the shared library

Start server testing and verification

Summary

2. My First Oozie Job

Installing and configuring Hue

Oozie concepts

Workflows

Coordinator

Bundles

Book case study

Running our first Oozie job

Types of nodes

Control flow nodes

Action nodes

Oozie web console

The Oozie command line

Summary

3. Oozie Fundamentals

Chapter case study

The Decision node

The Email action

Expression Language functions

Basic EL constants

Basic EL functions

Workflow EL functions

Hadoop EL constants

HDFS EL functions

Email action configuration

Job property file

Submission from the command line

Workflow states

Summary

4. Running MapReduce Jobs

Chapter case study

Running MapReduce jobs from Oozie

The job.properties file

Running the job

Running Oozie MapReduce job

Coordinators

Datasets

Frequency and time

Cron syntax for frequency

Timezone

The tag

Initial instance

My first Coordinator

Coordinator v1 definition

job.properties v1 definition

Coordinator v2 definition

job.properties v2 definition

Checking the job log

Running a MapReduce streaming job

Summary

5. Running Pig Jobs

Chapter case study

The Pig command line

The config-default.xml file

Pig action

Pig Coordinator job v2

Parameters in the Dataset's input and output events

current(int n)

hoursInDay(int n)

daysInMonth(int n)

latest(int n)

Coordinator controls

Pig Coordinator job v3

Summary

6. Running Hive Jobs

Chapter case study

Running a Hive job from the command line

Hive action

Validating Oozie Workflow

Hive 2 action

Parameterization of Coordinator jobs

dateOffset(String baseDate, int instance, String timeUnit)

dateTzOffet(String baseDate, String timezone)

formatTime(String timeStamp, String format)

Summary

7. Running Sqoop Jobs

Chapter case study

Running Sqoop command line

Sqoop action

HCatalog

HCatalog datasets

HCatalog EL functions

HCatalog Coordinator functions

Pig script

The job.properties file

The Sqoop action Coordinator

Running the job

Checking data in the Hive table

Summary

8. Running Spark Jobs

Spark action

Bundles

Data pipelines

Summary

9. Running Oozie in Production

Packaging and continuous delivery

Oozie in secured cluster

Rerun

Rerun Workflow

Rerun Coordinator

Rerun Bundle

Summary

Index

Apache Oozie Essentials

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2015

Production reference: 1011215

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-038-4

www.packtpub.com

Credits

Author

Jagat Jasjit Singh

Reviewers

Siva Prakash

Rahul Tekchandani

Commissioning Editor

Dipika Gaonkar

Acquisition Editor

Tushar Gupta

Content Development Editor

Preeti Singh

Technical Editor

Dhiraj Chandanshive

Copy Editor

Roshni Banerjee

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Priya Sane

Production Coordinator

Melwyn Dsa

Cover Work

Melwyn Dsa

About the Author

Jagat Jasjit Singh works for one of the largest telecom companies in Melbourne, Australia, as a big data architect. He has a total experience of over 10 years and has been working with the Hadoop ecosystem for more than 5 years. He is skilled in Hadoop, Spark, Oozie, Hive, Pig, Scala, machine learning, HBase, Falcon, Kakfa, GraphX, Flume, Knox, Sqoop, Mesos, Marathon, Chronos, Openstack, and Java. He has experience of a variety of Australian and European customer implementations. He actively writes on Big Data and IoT technologies on his personal blog (http://jugnu.life). Jugnu (a Punjabi word) is a firefly that glows at night and illuminates the world with its tiny light. Jagat believes in this same philosophy of sharing knowledge to make the world a better place. You can connect with him on LinkedIn at https://au.linkedin.com/in/jagatsingh.

All the (author side) earnings of this book will go towards charity. Please consider donating, if you have not purchased this book directly, at http://www.pingalwara.net/donations.html. You can donate with your PayPal account or credit card.

This book is dedicated to Almighty God, who gave me everything, my parents, and the wonderful people from the Omnia project at Commonwealth Bank of Australia (https://github.com/CommBank). I would like to acknowledge the help of Tushar Gupta, Dhiraj Chandanshive, Roshni Banerjee, and Preeti Singh from Packt Publishing in writing this book.

About the Reviewers

Siva Prakash has been working in the field of software development for the last 7 years. Currently, he is working with CISCO, Bangalore. He has an extensive development experience in desktop-, mobile-, and web-based applications in ERP, telecom, and the digital media industry. He has passion for learning new technologies and sharing knowledge thus gained with others. He has worked on big data technologies for the digital media industry. He loves trekking, travelling, music, reading books, and blogging.

He is available on LinkedIn at https://www.linkedin.com/in/techsivam.

Rahul Tekchandani is a Hadoop software developer who specializes in building and developing Hadoop data platforms for big financial institutions. With experience in software design, development, and support, he has engineered strong, data-driven applications using the Cloudera's Hadoop Distribution. Rahul has also worked as an information architect to support data sanitization and data governance.

Prior to his career in software development, he completed his masters in Management of Information Systems at University of Arizona and worked on academic projects for top tech and banking companies.

He currently lives in Charlotte, North Carolina. Visit his developer's blog at www.rahultekchandani.com to see what he is currently exploring, and to learn more about him.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

With the increasing popularity of Big Data in enterprise, every day more and more workloads are being shifted to Hadoop.

To run those regular processing jobs on Hadoop, we need a scheduler that can act as cron for all data pipelines. Oozie plays this role in the Big Data world.

This book introduces you to the world of Oozie using a step-by-step case study-based approach.

What this book covers

Chapter 1, Setting up Oozie, covers how to install and configure Oozie in Hadoop cluster. We will also learn how to install Oozie from the source code.

Chapter 2, My First Oozie Job, covers running a Hello World equivalent first Oozie job. It

Enjoying the preview?

Page 1 of 1

Apache Oozie Essentials

About this ebook

Singh Jagat Jasjit

Related authors

Related to Apache Oozie Essentials

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Apache Oozie Essentials

What did you think?

Book preview

Apache Oozie Essentials - Singh Jagat Jasjit

Table of Contents

Apache Oozie Essentials

Apache Oozie Essentials

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers