Professional Documents
Culture Documents
Anton Radice
eScience Research Institute
Overview:
1. Databases overview
2. MongoDB Introduction
Data format, schema, and modeling
CRUD operations (read, write)
3. Mongo for Big Data
Replication and sharding
Aggregation pipeline
4. Q&A
Databases overview:
SQL
NoSQL
NewSQL
Graph
Data Model
Relational
Document based
Hybrid
Graph based
Schema
Fixed
Flexible
Mixed
Flexible
Examples
MySQL,
SQLite,
PostgreSQL
Neo4j,
OrientDB
MongoDB: Introduction
Open sourced, cross-platform NoSQL database released in 2009
Written in C, C++, and JavaScript
Drivers for almost all popular languages:
C, C++, Java, Perl, Python, Ruby, Scala, and more
According to DB-engines.com ranking, Mongo is the fourth most
popular database, and the most popular NoSQL database
Cheaper than traditional enterprise systems
4 servers, 2 processors each:
Oracle Enterprise Edition: $456,000.00
MongoDB Subscription: $16,000.00
Features: high availability, journaling, replication, auto-sharding,
aggregation framework
Source: indeed.com
MongoDB: Schema
*Collections do not enforce document structure*
{
name: denis,
age: 29,
occupation: researcher,
interests: [high performance computing, big data, long walks on the beach],
address: <DenisAddressObject>,
teaching: {
course_name: Big Data Technologies,
semester: Spring 2016,
students_count: 11
}
document reference
},
{
name: anton,
age: 24,
occupation: student,
interests: [data science, big data, eating],
address: {
street: Beechwood Drive,
zip_code: 19083,
country: USA
}
}
embedded document
{
_id: 123,
name: John Smith,
job: {
position: accountant,
company: Deloitte
}
}
{
_id: 456,
user_id: 123,
position: accountant,
company: Deloitte
}
_id: 123,
name: John Smith
}
{
_id: 123,
name: John Smith,
jobs: [
{
position: accountant,
company: Deloitte
},
{
position: consultant,
company: StartupX
}]
{
_id: 456,
user_id: 123,
position: accountant,
company: Deloitte
}
{
_id: 789,
user_id: 123,
position: consultant,
company: StartupX
}
_id: 123,
name: John Smith
}
{
_id: 456,
user_id: 123,
position: accountant,
company: Deloitte
_id: 123,
name: John Smith,
jobs: [456, 789]
{
_id: 789,
user_id: 123,
position: consultant,
company: StartupX
}
SQL
Source: docs.mongodb.org
_id: 718581,
gpa: 3.56
SELECT student_id,
COUNT(grades) / credits as gpa
FROM students
GROUP BY student_id
HAVING gpa >= 3
SQL
Output
Further resources:
https://docs.mongodb.org/manual/
Detailed documentation
Tutorials
Presentations/webinars
MongoDB University
Any questions?
antonradice@gmail.com