Professional Documents
Culture Documents
Mongo DB 1
Haopeng Chen
Overview
Mongo DB specification
Basic concept of Mongo DB
Basic operations with shell
MongoDB specification
Document
Simple document
A document is roughly equivalent to a row in a relational database,
which contain one or multiple key-value pairs
{"greeting" : "Hello, world!"}
Most documents will be more complex than this simple one and often
will contain multiple key/value pairs:
{"greeting" : "Hello, world!", "foo" : 3}
Values in documents are not just blobs. They can be one of several
different data types
5
Document
Simple document
Keys must not contain the character \0 (the null character).
The . and $ characters have some special properties and should be used
only in certain circumstances
Keys starting with _ should be considered reserved; although this is not
strictly enforced.
MongoDB is type-sensitive and case-sensitive.
{"foo" : 3}
{"foo" : "3"}
{"Foo" : 3}
Document
Embedded document
embedded documents are entire Mongo DB documents that are used as the
values for a key in another document,
{
"name" : "John Doe",
"address" : {
"street" : "123 Park Street",
"city" : "Anytown",
"state" : "NY"
}
Collection
Collection
Collection
Naming
The empty string ("") is not a valid collection name.
You should not create any collections that start with system.
User-created collections should not contain the reserved character $ in
the name.
Subcollections
One convention for organizing collections is to use namespaced
subcollections separated by the . character.
For example, an application containing a blog might have a collection
named blog.posts and a separate collection named blog.authors.
This is for organizational purposes onlythere is no relationship between
the blog collection (it doesnt even have to exist) and its children.
10
Database
There are also several reserved database names, which you can
access directly but have special semantics. These are as follows:
admin: This is the root database, in terms of authentication.
local: This database will never be replicated and can be used to store any
collections that should be local to a single server.
config: When Mongo is being used in a sharded setup (see Chapter 10),
the config database is used internally to store information about the
shards.
11
When run with no arguments, mongod will use the default data
directory, /data/db/ (or C:\data\db\ on Windows), and port
27017.
If the data directory does not already exist or is not writable, the server
will fail to start.
12
A MongoDB Client
The shell contains some add-ons that are not valid JavaScript
syntax but were implemented because of their familiarity to
users of SQL shells.
13
Create
Inserting
db.foo.insert({"bar" : "baz"})
Batch Insert
A batch insert is a single TCP request
They are useful only if you are inserting multiple documents into a
single collection: you cant use batch inserts to insert into multiple
collections by one batch insert
MongoDB dont accept message longer than 16MB, so there is a limit to
how much can be inserted in single batch insert
14
Read
Update
16
Updating
Using Modifiers
Update modifiers are special keys that can be used to specify complex
update operations, such as altering, adding, or removing keys, and even
manipulating arrays and embedded documents.
Delete
Removing documents
db.users.remove( )
db.mailing.list.remove({"opt-out" : true})
Drop method
db.drop_collection("bar")
18
Data Types
Querying
You can perform ad hoc queries on the database using the find
or findOne functions and a query document.
You can query for ranges, set inclusion, inequalities, and more
by using $ conditionals.
Query Criteria "$lt", "$lte", "$gt", "$gte", and"$ne" are all comparison
operators, corresponding to <, <=, >, >=, and not equal respectively.
They can be combined to look for a range of values.
e.g.
> db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})
20
Querying
21
Querying
Cursor Internals
There are two sides to a cursor: the client-facing cursor and the database
cursor that the client-side one represents
Querying
Cursor Internals
There are two sides to a cursor: the client-facing cursor and the database
cursor that the client-side one represents
Querying
24
Querying
Cursor Internals
There are two sides to a cursor: the client-facing cursor and the database
cursor that the client-side one represents
25
Querying
There are a couple of conditions that can cause the death (and
subsequent cleanup) of a cursor.
First, when a cursor finishes iterating through the matching results, it
will clean itself up.
Another way is that, when a cursor goes out of scope on the client side,
the drivers send the database a special message to let it know that it can
kill that cursor.
This death by timeout is usually the desired behavior: very few
applications expect their users to sit around for minutes at a time
waiting for results.
26
Indexing
Normal index
When only a single key is used in the query, that key can be indexed to
improve the querys speed.
> db.people.ensureIndex({"username" : 1})
As a rule of thumb, you should create an index that contains all of the
keys in your query.
> db.people.find({"date" : date1}).sort({"date" : 1, "username" : 1}
> db.ensureIndex({"date" : 1, "username" : 1})
If you have more than one key, you need to start thinking about index
direction.
> db.ensureIndex{"username" : 1, "age" : -1}.
27
Indexing
28
Indexing
Geospatial Indexing
finding the nearest N things to a current location.
MongoDB provides a special type of index for coordinate plane queries,
called a geospatial index.
29
Indexing
30
Aggregation
Count
returns the number of documents in the collection
Distinct
finds all of the distinct values for a given key. You must specify a
collection and key:
> db.runCommand({"distinct" : "people", "key" : "age"})
then you will get all of the distinct ages like this
{"values" : [20, 35, 60], "ok" : 1}
31
There are many optional keys that can be passed to the MapReduce
command.
"finalize" : function, A final step to send reducer's output to.
"keeptemp" : Boolean, If the temporary result collection should be saved when
the connection is closed.
"query" : document, Query to filter documents by before sending to the map
function.
"sort" : document, Sort to use on documents before sending to the map (useful in
conjunction with
"scope" : document, Variables that can be used in any of the JavaScript code.
"verbose" : Boolean, Whether to use more verbose output in the server logs.
e.g. //this command will come back with todays webpage
32
Make a Connection
MongoClient mongoClient = new MongoClient();
// or
MongoClient mongoClient = new MongoClient( "localhost" );
// or
MongoClient mongoClient = new MongoClient( "localhost" , 27017 );
// or, to connect to a replica set, supply a seed list of members
MongoClient mongoClient = new MongoClient(Arrays.asList(
new ServerAddress("localhost", 27017),
new ServerAddress("localhost", 27018),
new ServerAddress("localhost", 27019)));
DB db = mongoClient.getDB( "mydb" );
33
34
Getting a Collection
BCollection coll = db.getCollection("testCollection");
Setting Write Concern
mongoClient.setWriteConcern(WriteConcern.JOURNALED);
Inserting a document
BasicDBObject doc = new BasicDBObject("name", "MongoDB").
append("type", "database").
append("count", 1).
append("info", new BasicDBObject("x", 203).append("y", 102));
coll.insert(doc);
35
36
Creating An Index
coll.createIndex(new BasicDBObject("i", 1)); // create index on "i",
ascending
38
Dropping A Database
MongoClient mongoClient = new MongoClient();
mongoClient.dropDatabase("myDatabase");
39
Advanced topics
Capped Collections
It is a special collection which is created in advance and is fixed in size.
When it is full, it will automatically age-out the oldest documents as
new documents are inserted.
Documents in capped collections cannot be removed or deleted or
updates a document with a larger one
When the queue is full, the oldest element will be replaced by the newest
40
Advanced topics
41
Advanced topics
GridFS
GridFS is a mechanism for storing large binary files in MongoDB. There
are several reasons why you might consider using GridFS for file storage:
Using GridFS can simplify your stack.
GridFS will leverage any existing replication or autosharding that youve set
up for MongoDB, so getting failover and scale-out for file storage is easy.
GridFS can alleviate some of the issues that certain file systems can exhibit
when being used to store user uploads.
You can get great disk locality with GridFS, because MongoDB allocates data
files in 2GB chunks
42
Administration
43
Monitoring
By default,
starting mongod also starts up a (very) basic HTTP server that listens on
a port 1,000 higher than the native driver port.
44
ServerStatus
ServerStatus
> db.runCommand({"serverStatus" : 1}:1)
45
Master-Slave Replication
46
Replica Sets
At any point in time, one node in the cluster is primary, and the
rest are secondary.
A secondary node will also write the operation to its own local oplog,
however, so that it is capable of becoming the primary
47
48
49
Sharding
50
Autosharding in MongoDB
51
When to Shard
53
Shard Keys
54
Chunks
55
Reference
56
Thank You!