Professional Documents
Culture Documents
TECHNICAL SKILLS
Hadoop Ecosystem : HDFS, Pig, Hive, Spark, Scala, Spark SQL, YARN,
SQOOP, Hue, Oozie, Shell Script
Databases/Tools: Oracle, DB2,MySql, SQL Server 2008, AQT
Server/ Platforms: Unix, CentOS,Windows
WORK EXPERIENCE
Project: 1
Company :
Client :
Team Size: 7
Profile : Hadoop and Spark Developer
Duration :
Tools : Oracle, MySQL, HDFS, Hive, Spark SQL, Shell Script
Description: We are working on 360 degree data view of the complete business of the
client across multiple repositories related to different products and purchases.
Responsibilities:
Data extraction from different servers in data lake
Data processing using Python and Spark
Data Ingestion in Hadoop data lake
Data validation using Quality check
Jobs Execution monitoring
Jobs automation using shell script
Data pipeline optimization
Working in Agile model
Project : 2
Company :
Client :
Team Size: 5
Profile : Hadoop and Spark Developer
Duration
Tools : Oracle, HDFS, Pig, Hive, Spark SQL
Database: Oracle
ETL Tools : Pig, Spark SQL
Data warehouse : Hive
Description: This Tool is implemented for Banking and financial services and provides
several services for Batch processing and Real time data streaming. We are working in
stock market project to handle real time and batch processing data sourced in Oracle,
GMI , Stream core and Hadoop Ecosystem. For Reporting we are using IBM Cognos
tool.
Responsibilities:
Data mapping in Data Lake layers
Worked on more than 70 scenarios for development and enhancement.
Data Analysis and pattern comparison
Transformation script writing in Pig
Worked on several semi-structured formats to process datasets in Pig like JSON,
XML , CSV, Fixed length file
Worked on bad files to reprocess in Pig using UDF
Implemented custom UDF in Pig
Worked on Hadoop and Spark integration from scratch
Data streaming using Spark
Spark SQL to connect with Hive warehouse
Large datasets processing in Spark DataSet
Worked on RDDs, Transformations, Actions
Worked on several functions in Scala Library to build Spark Applications using
Spark SQL
Linear Regression
Worked in Development, QA, Preprod and Production Environments
Migration from Oracle in Hive using SQOOP
Daily batch run and bug fixing in real time
Have worked on critical scenarios and given feasible and efficient solutions
Product development and release management
Project : 3
Company :
Client
Team Size: 5
Profile: Hadoop Developer
Duration :
Tools: HDFS, MapReduce, Pig,Hive,Hue,SQOOP,Flume,Cloudera cluster
Databases: DB2, SQL Server 2008,Oracle
Description:
Daimler Trucks North America (DTNA) is a brand which has its sub-brands and provides
Trucks and Engines in the market. The main objective of the project is to outsource the
Strategy, Design, Development, Coding, Testing, Deployment and Maintenance of the
Enterprise Data Repository for reporting purpose to gain the required Information from
their huge amount of Data. Data comes from more than 40 source systems and Tools used
in this project are Hadoop ecosystem tools described in below list.
Responsibilities:
Project : 4
Company :
Client: Team Size: 6
Profile: Working as ETL Developer on Informatica PowerCenter and Teradata
Duration :
Tools :
Databases: Oracle, DB2, SQL Server
Responsibilities:
Worked as ETL Developer
Have worked as ETL tester also in Dev and QA Environments
Have worked on SCD-1, SCD-2
Created Mapping Specification documents
Have worked on Teradata and implemented Test Cases with mimic queries for
mappings
Worked in Development and Enhancement Environment
Have worked on critical scenarios and given feasible and efficient solutions