Getting Started with Greenplum for Big Data Analytics

Ebook341 pages2 hours

Getting Started with Greenplum for Big Data Analytics

Name: Getting Started with Greenplum for Big Data Analytics
Author: Gollapudi Sunila
ISBN: 9781782177050

By Gollapudi Sunila

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Standard tutorial-based approach."Getting Started with Greenplum for Big Data" Analytics is great for data scientists and data analysts with a basic knowledge of Data Warehousing and Business Intelligence platforms who are new to Big Data and who are looking to get a good grounding in how to use the Greenplum Platform. It’s assumed that you will have some experience with database design and programming as well as be familiar with analytics tools like R and Weka.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateOct 23, 2013

ISBN9781782177050

Author

Gollapudi Sunila

Related authors

Skip carousel

Related to Getting Started with Greenplum for Big Data Analytics

Related ebooks

Skip carousel

PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Ebook
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
byWaldo Todd
Rating: 0 out of 5 stars
0 ratings
HDInsight Essentials - Second Edition
Ebook
HDInsight Essentials - Second Edition
byRajesh Nadipalli
Rating: 0 out of 5 stars
0 ratings
Learning Tableau 10 - Second Edition
Ebook
Learning Tableau 10 - Second Edition
byJoshua N. Milligan
Rating: 4 out of 5 stars
4/5
Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries
Ebook
Expert T-SQL Window Functions in SQL Server 2019: The Hidden Secret to Fast Analytic and Reporting Queries
byKathi Kellenberger
Rating: 0 out of 5 stars
0 ratings
Monitoring Hadoop
Ebook
Monitoring Hadoop
byGurmukh Singh
Rating: 0 out of 5 stars
0 ratings
Data Fluency: Empowering Your Organization with Effective Data Communication
Ebook
Data Fluency: Empowering Your Organization with Effective Data Communication
byZach Gemignani
Rating: 2 out of 5 stars
2/5
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
Ebook
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Hadoop Blueprints
Ebook
Hadoop Blueprints
byAnurag Shrivastava
Rating: 0 out of 5 stars
0 ratings
Oracle Warehouse Builder 11g: Getting Started
Ebook
Oracle Warehouse Builder 11g: Getting Started
byBob Griesemer
Rating: 0 out of 5 stars
0 ratings
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
Ebook
Tableau Desktop Certified Associate: Exam Guide: Develop your Tableau skills and prepare for Tableau certification with tips from industry experts
byDmitry Anoshin
Rating: 0 out of 5 stars
0 ratings
Spark for Data Science
Ebook
Spark for Data Science
bySrinivas Duvvuri
Rating: 0 out of 5 stars
0 ratings
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
Ebook
Getting Started with Oracle Data Integrator 11g: A Hands-On Tutorial
byDavid Hecksel
Rating: 5 out of 5 stars
5/5
Learn Hadoop in 24 Hours
Ebook
Learn Hadoop in 24 Hours
byAlex Nordeen
Rating: 0 out of 5 stars
0 ratings
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Learning Tableau
Ebook
Learning Tableau
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
IBM Cognos 10 Framework Manager
Ebook
IBM Cognos 10 Framework Manager
byTerry Curran
Rating: 0 out of 5 stars
0 ratings
Mastering PostgreSQL 9.6
Ebook
Mastering PostgreSQL 9.6
byHans-Jürgen Schönig
Rating: 0 out of 5 stars
0 ratings
Data Lake Development with Big Data
Ebook
Data Lake Development with Big Data
byPasupuleti Pradeep
Rating: 0 out of 5 stars
0 ratings
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
Ebook
Hadoop BIG DATA Interview Questions You'll Most Likely Be Asked
byVibrant Publishers
Rating: 0 out of 5 stars
0 ratings
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
Ebook
Scalable Big Data Architecture: A practitioners guide to choosing relevant Big Data architecture
byBahaaldine Azarmi
Rating: 0 out of 5 stars
0 ratings
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
Ebook
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
byFLOYD BAX
Rating: 0 out of 5 stars
0 ratings
My Part-Time Study Notes on Mssql Server
Ebook
My Part-Time Study Notes on Mssql Server
byMorris Sebenzile Mntoninzi
Rating: 0 out of 5 stars
0 ratings
Data Modeling with SAP BW/4HANA 2.0: Implementing Agile Data Models Using Modern Modeling Concepts
Ebook
Data Modeling with SAP BW/4HANA 2.0: Implementing Agile Data Models Using Modern Modeling Concepts
byKonrad Zaleski
Rating: 0 out of 5 stars
0 ratings
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Ebook
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
byDebananda Ghosh
Rating: 0 out of 5 stars
0 ratings
Spark SQL A Complete Guide
Ebook
Spark SQL A Complete Guide
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Graph Analytics A Clear and Concise Reference
Ebook
Graph Analytics A Clear and Concise Reference
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
SAS Viya: The Python Perspective
Ebook
SAS Viya: The Python Perspective
byKevin D. Smith
Rating: 0 out of 5 stars
0 ratings
Data Modeling A Complete Guide - 2021 Edition
Ebook
Data Modeling A Complete Guide - 2021 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Azure Databricks A Complete Guide - 2020 Edition
Ebook
Azure Databricks A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Data Analysis and Harmonization: A Simple Guide
Ebook
Data Analysis and Harmonization: A Simple Guide
byJeff Voivoda
Rating: 0 out of 5 stars
0 ratings

Data Visualization For You

Skip carousel

Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
Functional Aesthetics for Data Visualization
Ebook
Functional Aesthetics for Data Visualization
byVidya Setlur
Rating: 0 out of 5 stars
0 ratings
DAX Patterns: Second Edition
Ebook
DAX Patterns: Second Edition
byMarco Russo
Rating: 5 out of 5 stars
5/5
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
Ebook
The Big Book of Dashboards: Visualizing Your Data Using Real-World Business Scenarios
bySteve Wexler
Rating: 4 out of 5 stars
4/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Data Analytics for Beginners: Introduction to Data Analytics
Ebook
Data Analytics for Beginners: Introduction to Data Analytics
byAnthony S. Williams
Rating: 4 out of 5 stars
4/5
Winning The Room: Creating and Delivering an Effective Data-Driven Presentation
Ebook
Winning The Room: Creating and Delivering an Effective Data-Driven Presentation
byBill Franks
Rating: 0 out of 5 stars
0 ratings
D3.js in Action: Data visualization with JavaScript
Ebook
D3.js in Action: Data visualization with JavaScript
byElijah Meeks
Rating: 0 out of 5 stars
0 ratings
Teach Yourself VISUALLY Power BI
Ebook
Teach Yourself VISUALLY Power BI
byAlexander Loth
Rating: 0 out of 5 stars
0 ratings
How to be Clear and Compelling with Data: Principles, Practice and Getting Beyond the Basics
Ebook
How to be Clear and Compelling with Data: Principles, Practice and Getting Beyond the Basics
byJohn J Burrett
Rating: 0 out of 5 stars
0 ratings
Getting to Know ArcGIS Desktop 10.8
Ebook
Getting to Know ArcGIS Desktop 10.8
byMichael Law
Rating: 4 out of 5 stars
4/5
Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
Ebook
Effective Data Storytelling: How to Drive Change with Data, Narrative and Visuals
byBrent Dykes
Rating: 4 out of 5 stars
4/5
Cool Infographics: Effective Communication with Data Visualization and Design
Ebook
Cool Infographics: Effective Communication with Data Visualization and Design
byRandy Krum
Rating: 4 out of 5 stars
4/5
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
Ebook
Learning Tableau 2019 - Third Edition: Tools for Business Intelligence, data prep, and visual analytics, 3rd Edition
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Learning Tableau
Ebook
Learning Tableau
byJoshua N. Milligan
Rating: 0 out of 5 stars
0 ratings
Mastering Excel: Excel Apps
Ebook
Mastering Excel: Excel Apps
byMark Moore
Rating: 3 out of 5 stars
3/5
Learning PySpark
Ebook
Learning PySpark
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
The Chicago Guide to Writing About Numbers
Ebook
The Chicago Guide to Writing About Numbers
byJane E. Miller
Rating: 0 out of 5 stars
0 ratings
#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time
Ebook
#MakeoverMonday: Improving How We Visualize and Analyze Data, One Chart at a Time
byEva Murray
Rating: 0 out of 5 stars
0 ratings
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
Ebook
Python For Beginners.Learn Data Science in 5 Days the Smart Way and Remember it Longer. With Easy Step by Step Guidance & Hands on Examples. (Python Crash Course-Programming for Beginners): Python for Beginners
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
Ebook
Excel for Beginners 2023: A Step-by-Step and Comprehensive Guide to Master the Basics of Excel, with Formulas, Functions, & Charts
byGerald Stroud
Rating: 0 out of 5 stars
0 ratings
Google Analytics 4 Migration Quick Guide 2022: Universal Analytics disappears in July 2023 - are you ready?
Ebook
Google Analytics 4 Migration Quick Guide 2022: Universal Analytics disappears in July 2023 - are you ready?
byWynne Pirini
Rating: 0 out of 5 stars
0 ratings
Learning pandas - Second Edition
Ebook
Learning pandas - Second Edition
byHeydt Michael
Rating: 4 out of 5 stars
4/5
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
Ebook
The Applied SQL Data Analytics Workshop - Second Edition: Develop your practical skills and prepare to become a professional data analyst, 2nd Edition
byMatt Goldwasser
Rating: 0 out of 5 stars
0 ratings
Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
Ebook
Advanced Analytics with Power BI and Excel: Learn powerful visualization and data analysis techniques using Microsoft BI tools along with Python and R
byDejan Sarka
Rating: 0 out of 5 stars
0 ratings
GIS Tutorial for ArcGIS Pro 2.8
Ebook
GIS Tutorial for ArcGIS Pro 2.8
byWilpen L. Gorr
Rating: 5 out of 5 stars
5/5
Mastering Text Mining with R
Ebook
Mastering Text Mining with R
byAvinash Paul
Rating: 0 out of 5 stars
0 ratings
Creating Data Stories with Tableau Public
Ebook
Creating Data Stories with Tableau Public
byOhmann Ashley
Rating: 0 out of 5 stars
0 ratings
Data Visualization: A Practical Introduction
Ebook
Data Visualization: A Practical Introduction
byKieran Healy
Rating: 5 out of 5 stars
5/5

Related podcast episodes

Skip carousel

Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
Podcast episode
Building A Cost Effective Data Catalog With Tree Schema - Episode 158: An interview about the Tree Schema data catalog platform and using it to quickly get visibility into your data assets.
byData Engineering Podcast
0 ratings
0% found this document useful
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
Podcast episode
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake: Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform in the form of SQLake. In this episode Ori Rafael explains how they are automating the creation and scheduling of orchestration flows and their related transforations in a unified SQL interface.
byData Engineering Podcast
0 ratings
0% found this document useful
#54 Women in Data Science
Podcast episode
#54 Women in Data Science
byDataFramed
0 ratings
0% found this document useful
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
Podcast episode
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
byData Engineering Podcast
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
Podcast episode
Renee M. P. Teate, "SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis" (John Wiley & Sons, 2021): An interview with Renee M. P. Teate
byNew Books in Science, Technology, and Society
0 ratings
0% found this document useful
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
Podcast episode
An Agile Approach To Master Data Management with Mark Marinelli - Episode 46: Building A Master Data Catalog Using Machine Learning (Interview)
byData Engineering Podcast
100%
100% found this document useful
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
Podcast episode
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase: An interview with Matt Jaffee about FeatureBase, an open source bitmap database that allows you to query and analyze massive data sets at interactive speeds and the work they have done to simplify integration with the rest of your data platform.
byData Engineering Podcast
0 ratings
0% found this document useful
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
Podcast episode
2155: Databricks - The Story Behind the Lakehouse Company: Many are citing open source as the future. The UK Government's National Data Strategy even talks about the importance of opening public sector datasets to form the backbone of innovation, efficiency, and growth. This is a trend that Databricks...
byThe Tech Talks Daily Podcast
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
#92 Democratizing Data in Large Enterprises
Podcast episode
#92 Democratizing Data in Large Enterprises
byDataFramed
0 ratings
0% found this document useful
Building A Self Service Data Platform For Alternative Data Analytics At YipitData: An interview with the YipitData team about how they built a self service platform for building analytics products on alternative data sets to power investment strategies.
Podcast episode
Building A Self Service Data Platform For Alternative Data Analytics At YipitData: An interview with the YipitData team about how they built a self service platform for building analytics products on alternative data sets to power investment strategies.
byData Engineering Podcast
0 ratings
0% found this document useful
Real World Change Data Capture At Datacoral: An interview with Raghu Murthy about the reality of running and maintaining change data capture pipelines for customers at Datacoral.
Podcast episode
Real World Change Data Capture At Datacoral: An interview with Raghu Murthy about the reality of running and maintaining change data capture pipelines for customers at Datacoral.
byData Engineering Podcast
0 ratings
0% found this document useful
40: Should data visualization work be outsourced? w/ Mustafa Mustafa: BI tools change by the minute, so have you ever considered outsourcing your data visualization needs in the future? Maybe you should, especially if you don’t have proper in-house skill sets. Don’t risk your reputation because users can’t unsee a...
Podcast episode
40: Should data visualization work be outsourced? w/ Mustafa Mustafa: BI tools change by the minute, so have you ever considered outsourcing your data visualization needs in the future? Maybe you should, especially if you don’t have proper in-house skill sets. Don’t risk your reputation because users can’t unsee a...
byAnalytics on Fire
0 ratings
0% found this document useful
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
Podcast episode
Streaming Data Pipelines Made SQL With Decodable: An interview with Eric Sammer about the difficulty of working with streaming engines at a low level of abstraction and how he and his team at Decodable are working to make development of streaming data pipelines as straightforward as writing SQL
byData Engineering Podcast
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
Podcast episode
Data Modeling That Evolves With Your Business Using Data Vault - Episode 119: An interview about the data vault method of data modeling and how it simplifies integrating the evolving data sources that you are dealing with in your enterprise data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
#124 Using AI to Improve Data Quality in Healthcare
Podcast episode
#124 Using AI to Improve Data Quality in Healthcare
byDataFramed
0 ratings
0% found this document useful
Episode 098 - Generative AI and How It Is Redefining Developer Roles Today With Alyssa Lee: In this episode, Linda and Dave welcome back Alys…
Podcast episode
Episode 098 - Generative AI and How It Is Redefining Developer Roles Today With Alyssa Lee: In this episode, Linda and Dave welcome back Alys…
byAWS Developers Podcast
0 ratings
0% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
Podcast episode
Build Your Data Analytics Like An Engineer - Episode 81: An interview about how dbt enables your data teams to build better analytics in your data warehouse
byData Engineering Podcast
0 ratings
0% found this document useful
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
Podcast episode
Open Source TensorFlow with Yifei Feng: Yifei Feng, a TensorFlow software engineer, shares with Melanie and Mark about her work on the open source TensorFlow project and the tools she builds.
byGoogle Cloud Platform Podcast
100%
100% found this document useful
Building LLM-Based Applications with Azure OpenAI with Jay Emery - #657
Podcast episode
Building LLM-Based Applications with Azure OpenAI with Jay Emery - #657
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
Podcast episode
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
byAnalytics on Fire
0 ratings
0% found this document useful
Low Friction Data Governance With Immuta: An interview about how the Immuta platform simplifies the work of managing access control and data security as part of your data governance strategy.
Podcast episode
Low Friction Data Governance With Immuta: An interview about how the Immuta platform simplifies the work of managing access control and data security as part of your data governance strategy.
byData Engineering Podcast
0 ratings
0% found this document useful
#121 — ChatGPT and How Generative AI is Augmenting Workflows
Podcast episode
#121 — ChatGPT and How Generative AI is Augmenting Workflows
byDataFramed
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
Podcast episode
007: Data Cleansing & Analysis with Oz du Soleil: Oz du Soleil is an Excel MVP since 2015 and is an expert in data cleansing & analysis. He has an Excel blog over at www.datascopic.net which is his commitment to data literacy. He’s the leading author on the revised version of Guerrilla Data...
byLearn Microsoft Excel with MyExcelOnline
0 ratings
0% found this document useful

Skip carousel

Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
Data Fabric
PC Pro Magazine
Article
Data Fabric
Aug 13, 2020
3 min read
Your Data Is Yours
Linux Format
Article
Your Data Is Yours
May 4, 2021
“It’s nice to let someone else take over occasionally. For example, you might want to hand over database management to a service provider to free up capacity and focus on key business initiatives. Managed databases are increasingly popular – in our l
1 min read
Grafana Terminology
Linux Format
Article
Grafana Terminology
Jan 14, 2020
A Grafana data source is a database, file or service that provides data to Grafana – it cannot operate without data. A Grafana panel is the basic building block of Grafana. Panels are made of visualisations or queries. A Grafana query is used for req
1 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
Types Of Databases
Linux Format
Article
Types Of Databases
Aug 27, 2019
NoSQL databases provide the performance, scalability and stability that’s required by the modern data-driven apps we interact with these days. But that is where the similarity between NoSQL systems end. In fact, it wouldn’t be wrong to say that the o
1 min read
Budget Strategies for Maximizing Big Data
Entrepreneur
Article
Budget Strategies for Maximizing Big Data
Jun 1, 2016
1 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
01 Ready Or Not, AI Is Here To Assist You
HWM Singapore
Article
01 Ready Or Not, AI Is Here To Assist You
Jul 11, 2023
4 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Getting The edge
The European Business Review
Article
Getting The edge
Feb 25, 2021
7 min read
Buying The Tool
Techfastly
Article
Buying The Tool
Apr 1, 2021
3 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Building Trends, Building Momentum
Facility Management
Article
Building Trends, Building Momentum
Oct 14, 2019
3 min read
Machine-learning On Your Android Phone?
APC
Article
Machine-learning On Your Android Phone?
Dec 30, 2019
4 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Decoding The Impact Of AI
Her World Singapore
Article
Decoding The Impact Of AI
May 5, 2023
6 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
The Machine Learning Revolution
APC
Article
The Machine Learning Revolution
Sep 6, 2021
8 min read
Quantum Leap
Marketing
Article
Quantum Leap
Jul 11, 2019
6 min read
Leadership Forum: Investing in Disruption
Rotman Management
Article
Leadership Forum: Investing in Disruption
Jan 1, 2019
10 min read
Cloudy With No Chance Of Erp
Architectural Review Asia Pacific
Article
Cloudy With No Chance Of Erp
Nov 11, 2019
ERP (enterprise resource planning) was born around the time the first ‘[Something] for Dummies’ book was published*. It’s typically inflexible, uncompromising software designed for large businesses, like banks, large corporations, manufacturing and s
2 min read
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
The European Business Review
Article
ARTIFICIAL INTELLIGENCE (AI) IN SUPPLY CHAIN PLANNING THE Future is Here & Now
Dec 3, 2019
7 min read
How Google Is Making The AI That Powers Its Products Better.
HWM Singapore
Article
How Google Is Making The AI That Powers Its Products Better.
Jun 3, 2019
3 min read
The Machine Learning Revolution
Maximum PC
Article
The Machine Learning Revolution
Aug 17, 2021
8 min read
Pragmatic Parametricism
Architectural Review Asia Pacific
Article
Pragmatic Parametricism
Nov 13, 2020
4 min read
How Can AI Help Your Business?
PC Pro Magazine
Article
How Can AI Help Your Business?
Jun 8, 2023
7 min read
Salesforce Adding Einstein Analytics Al To Tableau Platform
Techfastly
Article
Salesforce Adding Einstein Analytics Al To Tableau Platform
Feb 4, 2021
3 min read
Why We Need To Fear The Risk Of AI Model Collapse
Evening Standard
Article
Why We Need To Fear The Risk Of AI Model Collapse
Dec 17, 2023
4 min read

Related categories

Skip carousel

Reviews for Getting Started with Greenplum for Big Data Analytics

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Getting Started with Greenplum for Big Data Analytics - Gollapudi Sunila

Getting Started with Greenplum for Big Data Analytics

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Instant Updates on New Packt Books

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Big Data, Analytics, and Data Science Life Cycle

Enterprise data

Classification

Features

Big Data

So, what is Big Data?

Multi-structured data

Data analytics

Data science

Data science life cycle

Phase 1 – state business problem

Phase 2 – set up data

Phase 3 – explore/transform data

Phase 4 – model

Phase 5 – publish insights

Phase 6 – measure effectiveness

References/Further reading

Summary

2. Greenplum Unified Analytics Platform (UAP)

Big Data analytics – platform requirements

Greenplum Unified Analytics Platform (UAP)

Core components

Greenplum Database

Hadoop (HD)

Chorus

Command Center

Modules

Database modules

HD modules

Data Integration Accelerator (DIA) modules

Core architecture concepts

Data warehousing

Column-oriented databases

Parallel versus distributed computing/processing

Shared nothing, massive parallel processing (MPP) systems, and elastic scalability

Shared disk data architecture

Shared memory data architecture

Shared nothing data architecture

Data loading patterns

Greenplum UAP components

Greenplum Database

The Greenplum Database physical architecture

The Greenplum high-availability architecture

High-speed data loading using external tables

External table types

Polymorphic data storage and historic data management

Data distribution

Hadoop (HD)

Hadoop Distributed File System (HDFS)

Hadoop MapReduce

Chorus

Greenplum Data Computing Appliance (DCA)

Greenplum Data Integration Accelerator (DIA)

References/Further reading

Summary

3. Advanced Analytics – Paradigms, Tools, and Techniques

Analytic paradigms

Descriptive analytics

Predictive analytics

Prescriptive analytics

Analytics classified

Classification

Forecasting or prediction or regression

Clustering

Optimization

Simulations

Modeling methods

Decision trees

Association rules

The Apriori algorithm

Linear regression

Logistic regression

The Naive Bayesian classifier

K-means clustering

Text analysis

R programming

Weka

In-database analytics using MADlib

References/Further reading

Summary

4. Implementing Analytics with Greenplum UAP

Data loading for Greenplum Database and HD

Greenplum data loading options

External tables

gpfdist

gpload

Hadoop (HD) data loading options

Sqoop 2

Greenplum BulkLoader for Hadoop

Using external ETL to load data into Greenplum

Extraction, Load, and Transformation (ELT) and Extraction, Transformation, Load, and Transformation (ETLT)

Greenplum target configuration

Sourcing large volumes of data from Greenplum

Unsupported Greenplum data types

Push Down Optimization (PDO)

Greenplum table distribution and partitioning

Distribution

Data skew and performance

Optimizing the broadcast or redistribution motion for data co-location

Partitioning

Querying Greenplum Database and HD

Querying Greenplum Database

Analyzing and optimizing queries

The ANALYZE function

The EXPLAIN function

Dynamic Pipelining in Greenplum

Querying HDFS

Hive

Pig

Data communication between Greenplum Database and Hadoop (using external tables)

Data Computing Appliance (DCA)

Storage design, disk protection, and fault tolerance

Master server RAID configurations

Segment server RAID configurations

Monitoring DCA

Greenplum Database management

In-database analytics options (Greenplum-specific)

Window functions

The PARTITION BY clause

The ORDER BY clause

The OVER (ORDER BY…) clause

Creating, modifying, and dropping functions

User-defined aggregates

Using R with Greenplum

DBI Connector for R

PL/R

Using Weka with Greenplum

Using MADlib with Greenplum

Using Greenplum Chorus

Pivotal

References/Further reading

Summary

Index

Getting Started with Greenplum for Big Data Analytics

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: October 2013

Production Reference: 1171013

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78217-704-3

www.packtpub.com

Cover Image by Aniket Sawant (<aniket_sawant_photography@hotmail.com>)

Credits

Author

Sunila Gollapudi

Reviewers

Brian Feeny

Scott Kahler

Alan Koskelin

Tuomas Nevanranta

Acquisition Editor

Kevin Colaco

Commissioning Editor

Deepika Singh

Technical Editors

Kanhucharan Panda

Vivek Pillai

Project Coordinator

Amey Sawant

Proofreader

Bridget Braund

Indexer

Mariammal Chettiyar

Graphics

Valentina D'silva

Ronak Dhruv

Abhinash Sahu

Production Coordinator

Adonia Jones

Cover Work

Adonia Jones

Foreword

In the last decade, we have seen the impact of exponential advances in technology on the way we work, shop, communicate, and think. At the heart of this change is our ability to collect and gain insights into data; and comments like Data is the new oil or we have a Data Revolution only amplifies the importance of data in our lives.

Tim Berners-Lee, inventor of the World Wide Web said, Data is a precious thing and will last longer than the systems themselves. IBM recently stated that people create a staggering 2.5 quintillion bytes of data every day (that's roughly equivalent to over half a billion HD movie downloads). This information is generated from a huge variety of sources including social media posts, digital pictures, videos, retail transactions, and even the GPS tracking functions of mobile phones.

This data explosion has led to the term Big Data moving from an Industry buzz word to practically a household term very rapidly. Harnessing Big Data to extract insights is not an easy task; the potential rewards for finding these patterns are huge, but it will require technologists and data scientists to work together to solve these problems.

The book written by Sunila Gollapudi, Getting Started with Greenplum for Big Data Analytics, has been carefully crafted to address the needs of both the technologists and data scientists.

Sunila starts with providing excellent background to the Big Data problem and why new thinking and skills are required. Along with a dive deep into advanced analytic techniques, she brings out the difference in thinking between the new Big Data science and the traditional Business Intelligence, this is especially useful to help understand and bridge the skill gap.

She moves on to discuss the computing side of the equation-handling scale, complexity of data sets, and rapid response times. The key here is to eliminate the noise in data early in the data science life cycle. Here, she talks about how to use one of the industry's leading product platforms like Greenplum to build Big Data solutions with an explanation on the need for a unified platform that can bring essential software components (commercial/open source) together backed by a hardware/appliance.

She then puts the two together to get the desired result—how to get meaning out of Big Data. In the process, she also brings out the capabilities of the R programming language, which is mainly used in the area of statistical computing, graphics, and advanced analytics.

Her easy-to-read practical style of writing with real examples shows her depth of understanding of this subject. The book would be very useful for both data scientists (who need to learn the computing side and technologies to understand) and also for those who aspire to learn data science.

V. Laxmikanth

Managing Director

Broadridge Financial Solutions (India) Private Limited

www.broadridge.com

About the Author

Sunila Gollapudi works as a Technology Architect for Broadridge Financial Solutions Private Limited. She has over 13 years of experience in developing, designing and architecting data-driven solutions with a focus on the banking and financial services domain for around eight years. She drives Big Data and data science practice for Broadridge. Her key roles have been Solutions Architect, Technical leader, Big Data evangelist, and Mentor.

Sunila has a Master's degree in Computer Applications and her passion for mathematics enthused her into data and analytics. She worked on Java, Distributed Architecture, and was a SOA consultant and Integration Specialist before she embarked on her data journey. She is a strong follower of open source technologies and believes in the innovation that open source revolution brings.

She has been a speaker at various conferences and meetups on Java and Big Data. Her current Big Data and data science specialties include Hadoop, Greenplum, R, Weka, MADlib, advanced analytics, machine learning, and data integration tools such as Pentaho and Informatica.

With a unique blend of technology and domain expertise, Sunila has been instrumental in conceptualizing architectural patterns and providing reference architecture for Big Data problems in the financial services domain.

Acknowledgement

It was a pleasure to work with Packt Publishing on this project. Packt has been most accommodating, extremely quick, and responsive to all requests.

I am deeply grateful to Broadridge for providing me the platform to explore and build expertise in Big Data technologies. My greatest gratitude to Laxmikanth V. (Managing Director, Broadridge) and Niladri Ray (Executive Vice President, Broadridge) for all the trust, freedom, and confidence in me.

Thanks to my parents for having relentlessly encouraged me to explore any and every subject that interested me.

Authors usually thank their spouses for their patience and support or words to that effect. Unless one has lived through the actual experience, one cannot fully comprehend how true this is. Over the last ten years, Kalyan has endured what must have seemed like a nearly continuous stream of whining punctuated by occasional outbursts of exhilaration and grandiosity—all of which before the background of the self-absorbed attitude of a typical author. His patience and support were unfailing.

Last but not least, my love, my daughter, my angel, Nikita, who has been my continuous drive. Without her being as accommodative as she was, this book wouldn't have been possible.

About the Reviewers

Brian Feeny is a technologist/evangelist working with many Big Data technologies such as analytics, visualization, data mining, machine learning, and statistics. He is a graduate student in Software Engineering at Harvard University, primarily focused on data science, where he gets to work on interesting data problems using some of the latest methods and technology.

Brian works for Presidio Networked Solutions, where he helps businesses with their Big Data challenges and helps them understand how to make best use of their data.

I would like to thank my wife, Scarlett, for her tolerance of my busy schedule. I would like to thank Presidio, my employer, for investing in in our Big Data practice. Lastly, I would like to thank EMC and Pivotal for the excellent training and support they have given Presidio and myself.

Scott Kahler started down the path in the mid 80s when he disconnected the power LED on his Commodore 64. In this fashion he could run his handwritten Dungeons and Dragons' random character generator, and his parents wouldn't complain about the computer being

Enjoying the preview?

Page 1 of 1

Getting Started with Greenplum for Big Data Analytics

About this ebook

Gollapudi Sunila

Related authors

Related to Getting Started with Greenplum for Big Data Analytics

Related ebooks

Data Visualization For You

Related podcast episodes

Related articles

Related categories

Reviews for Getting Started with Greenplum for Big Data Analytics

What did you think?

Book preview

Getting Started with Greenplum for Big Data Analytics - Gollapudi Sunila

Table of Contents

Getting Started with Greenplum for Big Data Analytics

Getting Started with Greenplum for Big Data Analytics

Credits

Foreword

About the Author

Acknowledgement

About the Reviewers