You are on page 1of 78

Cross-Platform

Data Synchronization

Dan Grover
Wonder Warp Software LLC

Friday, October 16, 2009 1

Good morning. I’m going to talk today about how you can write your own cross-plaform data
synchronization as part of your iPhone apps.
Outline
1 Why Syncing Is Important
2 Syncing Through The Ages
and why you still might want to write your own

3 Algorithms & Architecture


4 Implementing Sync in Obj-C
Friday, October 16, 2009 2

Here’s what we’re going to talk about today.


- First, I want to persuade you why data synchronization is important, and why you might
want to add it to your app.
- Next, I’ll explain the ways data syncing has been solved by apps before, the advantages and
disadvantages of various approaches, and explain why you may want to write your own.
- Then we’re going to go over different algorithms that you can use to write your data
synchronization code. I’m going to be very abstract and handwavy because it’s hard to talk
about this kind of stuff when you’re also talking about implementation details.
- Finally, we’ll dig in and talk about how to actually implement this stuff in Objective-C using
the Cocoa APIs available to you.
Who I Am

• Former Northeastern student


• Independent software developer

Friday, October 16, 2009 3


How I Learned About
Syncing

Friday, October 16, 2009 4


ShoveBox for Mac

Friday, October 16, 2009 5

I write an app called ShoveBox for the Mac. (describe)


Friday, October 16, 2009 6

Last November, I get an email from a friend of mine involved in the local Mobile Monday
group here in Boston. They were going to do a fancy event at the Omni Parker House on “up
and coming” mobile companies. Unfortunately, they couldn’t find enough up and coming
mobile companies, so they asked me to present instead.

“So do you have anything you would like to present?”

At that time, I was mostly focused on Mac software -- I had a game out, but nothing much.
So I said “Oh, of course, I can demo the new iPhone version of ShoveBox”

Unfortunately, there was no iPhone version of ShoveBox. I didn’t really want to do one. It was
kind of beyond the scope of the app. And syncing was HAAARRRD.

So I made up a functional prototype of the iPhone app. I added a pretend dialog to the Mac
app to show it syncing. I had a script that I used to convert the example data over so it looked
the same.
?
Friday, October 16, 2009 7

I actually *did* want to make the iPhone version for real, though. But I had no idea how it was
going to work. I played around with a few half-way solutions -- storing the new entries and
just propagating those on sync. But I realized that real, honest-to-god two-way syncing was
doable if I just sat down and thought about it for a while. I studied all the ways that people
are doing syncing and realized it wouldn’t be too hard to write my own from scratch. Sounds
crazy.
Friday, October 16, 2009 8

A few months later, I finally ship the iPhone version. Sales quadruple, it gets two reviews in
Macworld. Still some bugs with syncing, but eventually those get ironed out.
Quick Demo

Friday, October 16, 2009 9


Outline
1 Why Syncing Is Important
2 Syncing Through The Ages
and why you still might want to write your own

3 Algorithms & Architecture


4 Implementing Sync in Obj-C
Friday, October 16, 2009 10
1
Why Syncing is
Important

Friday, October 16, 2009 11

I’m going to get on my soapbox for a moment and explain briefly why I think this is an
important topic, and how it’s applicable to more apps than you’d think.
Friday, October 16, 2009 12

Syncing has been something people have been trying to solve for a long time.
If you follow the current hype, we don’t have to worry about it because...
the
CLOUD

Friday, October 16, 2009 13

...you put everything on the cloud! The cloud will solve all our problems!
The popular conception of the trend of “cloud computing” is a little wrong. People think of it
as a monolithic thing.
Friday, October 16, 2009 14

But the reality is that the huge benefit of cloud computing is that you can outsource the right
things to the right people. I use one company for sending my email newsletters, because they
have the best infrastructure and software for that. I use another for my regular web hosting,
and yet another to host downloads. And I use a help desk app called Zendesk. So it’s not
really on “the cloud” -- it’s on a lot of clouds!

So we’re back to the same problem -- data is going to be in a ton of different places, and you
have to build systems that can deal with that. Sync plays a big part.
A

CLOUD
A

CLOUD

CLOUD A

CLOUD
Friday, October 16, 2009 15

So the future’s more complicated than it seems. It’s not “the cloud”, but lots of clouds and
client apps and platforms and apparently goats. And they all have to be share data but
operate independently.
Does your app pass the
Green Line Test?

Friday, October 16, 2009 16

And if you don’t think data synchronization applies to your app, I’d like you all to try this
while you’re in the city. I call it the Green Line Test.
Friday, October 16, 2009 17

I used to live near Lechmere in East Cambridge, and I’d commute in to classes at
Northeastern using the Green Line. The Green Line touches a lot of areas of Boston and goes
above ground and below. Some of the stations underground are dead, some have reception.
Inevitably, the ones that the train stops inexplicably for 20 minutes in will be those that
don’t. You see, they’ve upgraded all the trains and haven’t quite got all the kinks worked out.

If your app is one of those “thin” or “hybrid” apps that needs to make an HTTP request to do
anything, you should try running your app for the entirety of a Green Line ride. How does it
handle it when you lose connectivity for a minute? Pop up an error? Or stall indefinitely? How
good an experience is it? Do you cache things well, or does it always need a connection?

If you find that it’s not very good in this situation, you should consider making more of your
application operate on the device itself, and then sync its state back to the cloud. It will be
more responsive and usable more of the time. You’ve probably avoided something like this
because, well, syncing is a pain. But what I’m going to talk about in this presentation will
help.
2
Syncing Through The Ages
and

Why You Still Might Want To


Write Your Own

Friday, October 16, 2009 18


Friday, October 16, 2009 19

I thought this tweet from Steven Frank was funny. It’s true. It never works.
I think that’s because there’s not a lot of knowledge about syncing out there. There are a lot
of companies that have written (bad) syncing, and a few academic papers on it. But not a lot
of talk about syncing as a subject. If more people didn’t have to waste all this time learning
the basics for themselves, we could have better syncing as more people work out the kinks
and integrate it in more systems.
Set-Reconciliation
Problem

Friday, October 16, 2009 20

Academics call syncing the “set reconciliation problem”. You’ve got two sets, and you want to
reconcile their differences. The literature on it is pretty limited though.
rsync

Friday, October 16, 2009 21


Subversion

Friday, October 16, 2009 22

Subversion is a kind of syncing a lot of us probably use every day. Like most version control
systems, the idea is that your whole team can have the most current copy of the code.
Data ≠ Files

Friday, October 16, 2009 23

But it’s important to note that there’s a big difference between syncing *data* and syncing
*files*. Syncing data is a LOT harder!
DropBox

Friday, October 16, 2009 24

Dropbox is a consumer file syncing solution. But it actually ends up working a lot more like
Subversion than you’d think. It keeps revisions and actually handles conflicts in a neat way.
HotSync

Friday, October 16, 2009 25

Palm was one of the first companies to try to make a comprehensive syncing solution for
consumers.

The way HotSync works is that, once you’ve done the first sync, the Palm would set these
status flags on any piece of data that you changed. That would make it really fast to sync
back up with your PC, because the PC had an old copy of the data that both devices had the
last time you synced.
Mac OS X
Sync Services
Friday, October 16, 2009 26

Sync Services is Apple’s syncing framework. It’s pretty neat, and if you were like me and
trying to write a Mac app that synced with an iPhone app, it would *almost* work.
Sync Services

Your App Truth Database

Macs

Friday, October 16, 2009 27

Sync Services has this concept of a “Truth Database” -- where you replicate all your data so
that it can sync it elsewhere. It gives you lots of goodies to sync your app to the Truth
database -- pushing and pulling changes. They give you tools to define the schema you want
the Truth to keep for your data.

But then it gets magically put on MobileMe and synced to other Macs. You don’t have any
control over that.

The iPhone supports MobileMe, but only for syncing contacts, appointments, and notes. It
doesn’t read in the truth database from Sync Services, it’s totally separate. There is no Sync
Services for the iPhone.

So that’s kind of a bummer.


Two Approaches:

History-Based
Ex-Post-Facto

Friday, October 16, 2009 28


History-Based Ex-Post-Facto
PROS PROS

- Efficient and accurate - Easy to bolt onto an existing


system
- Hot swappable: arbitrary
configurations of devices in
any state can be synced

CONS CONS

- All client software must - Syncing can be slower


maintain status flags/history - Requires accurate date/time
- Does not scale as well info
- Complicated

Friday, October 16, 2009 29


History-Based Ex-Post-Facto

Subversion Rsync
Dropbox Sync Services
HotSync (Fast) HotSync (Slow)

Friday, October 16, 2009 30


When To Write Your Own
• When your schema demands custom
handling
• Dependencies
• Ordering
• When data needs to be specially converted
and prepared for different clients/devices
• iTunes and iPod Shuffles
• When it’s a core function
Friday, October 16, 2009 31
3
Algorithms
and

Architecture

Friday, October 16, 2009 32


A A∩B
B

Friday, October 16, 2009 33

So in these algorithms, we’re going to be a little abstract and think of this as two sets of data.
- A is all the data that’s on your first device, B is all the data that’s on the second device.
- Here’s all the data that’s *only on A*. That needs to be put on B if it was added, deleted
from A if not.
- Here’s all the data that’s *only on B*. That needs to be put on A if it was added, deleted
from B if not.
- Here’s the data that’s on both. This is the trickiest part. We need to sift through this data
and figure out if any of it has been modified since the last sync. We need to merge
modifications when we can, and otherwise, ask the user to resolve the conflict.
Friday, October 16, 2009 34
Goal of a Sync Algorithm

Make Two Sets The Same (duh!)

... in a way consistent with user expectations

... as quickly as possible

Friday, October 16, 2009 35

So what is the goal of any sync algorithm?


To make both sets of data the same.
Well, that part is pretty easy. I could just erase what’s on your server account and erase
what’s on your iPhone. Done!
Turns out it’s more complicated. There are a lot of *correct* ways to make this happen, but
only some of them are what the user is expecting to see.
The sync also has to be fast. This usually means a minimum of data being transferred.
Three Algorithms

Copy Sync Merge

Friday, October 16, 2009 36

But there are a few ways to skin a cat. Let’s look at each of these. They all meet the definition
we discussed, but go about it differently.
Copy
A B
Good Will Hunting Good Will Hunting

The Departed Spenser: For Hire

21 The Boondock Saints

With Honors

Friday, October 16, 2009 37


Copy
A B
Good Will Hunting Good Will Hunting

The Departed The Departed

21 21

Friday, October 16, 2009 38


Merge
A B
Good Will Hunting Good Will Hunting

The Departed Spenser: For Hire

21 The Boondock Saints

With Honors

Friday, October 16, 2009 39


Merge
A B
Good Will Hunting Good Will Hunting

Spenser: For Hire Spenser: For Hire

The Departed The Departed

The Boondock Saints The Boondock Saints

21 21

With Honors With Honors


Friday, October 16, 2009 40
Sync
A last sync = 12PM
now = 3PM
B
created: 11AM created: 11AM
2PM
Good Will… created:
modified: 11AM Good Will… modified: 11AM
modified: 2PM

created: 2PM created: 1PM


The Departed modified: 2PM Boondock… modified: 1PM

created: 11AM created: 2PM


21 modified: 11AM With Honors modified: 2PM

Friday, October 16, 2009 41


Sync
A last sync = 12PM
now = 3PM
B
created: 11AM created: 11AM
2PM
Good Will… created:
modified: 11AM Good Will… modified: 11AM
modified: 2PM

created: 2PM created: 2PM


The Departed modified: 2PM The Departed modified: 2PM

created: 1PM
Boondock…
created: 1PM
Boondock… modified: 1PM modified: 1PM

created: 2PM created: 2PM


With Honors modified: 2PM With Honors modified: 2PM

Friday, October 16, 2009 42


Three Algorithms

Copy Sync Merge

Friday, October 16, 2009 43

So let’s go back here and talk about when to use each of these algorithms:
SYNC: This is what you’re going to want to do 95% of the time.
The other two algorithms are for when you’re first setting two devices up to sync.
COPY: Some people doing sync like to offer you a choice of data on either device to become
the “one true” set of data.
MERGE: What I do with ShoveBox is just do a merge the first time -- because there might be
data on both devices they want to keep. It avoids any confusion over the choice, and
nobody’s going to be pissed with the initial result.
Needed for Sync

• On each device, each object needs:


• Creation Date
• Modification Date
• UDID

Friday, October 16, 2009 44


Sync: In Depth
PREPARE
SYNC OBJECTS IN ONLY A
SYNC OBJECTS IN ONLY B
SYNC INTERSECTION
CLEAN UP
Friday, October 16, 2009 45
Sync: In Depth
PREPARE
SYNC OBJECTS IN ONLY A
SYNC OBJECTS IN ONLY B
SYNC INTERSECTION
CLEAN UP
Friday, October 16, 2009 46
PREPARE

Establish Communication With Sources


Grab summaries from A and B

UUIDs, creation, modification


Sort into sets

Friday, October 16, 2009 47


Sync: In Depth
PREPARE
SYNC OBJECTS IN ONLY A
SYNC OBJECTS IN ONLY B
SYNC INTERSECTION
CLEAN UP
Friday, October 16, 2009 48
SYNC OBJECTS IN ONLY A

For each object o in a:

if o.creation > last sync then

tell b to copy o over

else

tell a to delete o

end if

next

Friday, October 16, 2009 49


Sync: In Depth
PREPARE
SYNC OBJECTS IN ONLY A
SYNC OBJECTS IN ONLY B
SYNC INTERSECTION
CLEAN UP
Friday, October 16, 2009 50
SYNC OBJECTS IN ONLY B

For each object o in b:

if o.creation > last sync then

tell a to copy o over

else

tell b to delete o

end if

next

Friday, October 16, 2009 51


Sync: In Depth
PREPARE
SYNC OBJECTS IN ONLY A
SYNC OBJECTS IN ONLY B
SYNC INTERSECTION
CLEAN UP
Friday, October 16, 2009 52
SYNC INTERSECTION

For each object o in both a and b:
if o.modification < last sync then

skip it

else •

if only a’s mod > last sync then


propogate a’s version to b


else if only b’s mod > last sync then


propogate b’s version to a


else if both a and b’s mod > last sync then


present conflict

end•


next

Friday, October 16, 2009 53


Sync: In Depth
PREPARE
SYNC OBJECTS IN ONLY A
SYNC OBJECTS IN ONLY B
SYNC INTERSECTION
CLEAN UP
Friday, October 16, 2009 54
CLEAN UP

tell a and b we’re finished

store current time as last sync

Friday, October 16, 2009 55


What’s wrong with this?
1. Single last-sync date can cause problems
with partial syncs.
SOLUTION Sync engine keeps per-item
last-sync dates
2. Single modification date makes merging
hard
SOLUTION Keep per-attribute
modification dates on each source

Friday, October 16, 2009 56


INTERSECTION REVISITED

else if both a and b’s mod > last sync then

let c = new list of conflicting keys

let e = new entry record

for each key k on o

if a[o].k == b[o].k then

e.k = a[o].k

else

if only a[o].k.mod > o.last sync then

e.k = a[o].k

else if only b[o].k.mod > o.last sync then

e.k = b[o].k

else

c += k

end if

end if
next
Friday, October 16, 2009 57


if c.count > 0 then

present conflict to user

e=a|b

end if

push e to a and b
next entry
Going Further

• On textual keys, if the same key on the


same entry was modified on both
entries, then use diff to do a text merge
and

• only ask the user to select one


version or the other if there is a text
merge conflict

Friday, October 16, 2009 58


Architecture

Friday, October 16, 2009 59


Architecture

Syncer

Friday, October 16, 2009 60


Architecture

Syncer

A B
Source Source
Friday, October 16, 2009 61
Architecture

Syncer

A B
LocalSource
SQLLite DB Source
Web Service
Friday, October 16, 2009 62
iPhone App
Architecture

Web Service
Friday, October 16, 2009 63
Architecture The Cloud

iPhone App

Web Service
Friday, October 16, 2009 64
Architecture
Mac App iPhone App

Friday, October 16, 2009 65


Architecture

Syncer

A B
Source Source
Friday, October 16, 2009 66
Sync Source Abstraction

• A sync source supports:


• Create/Overwrite Object
• Delete Object
• Get Object
• Get summary

Friday, October 16, 2009 67


4
Implementing Sync
in Objective-C

Friday, October 16, 2009 68


UDIDs
example:
DBCE017A-AF95-11DE-98BE-228156D89593

how to generate:
CFUUIDRef uuid = CFUUIDCreate(kCFAllocatorDefault);

CFUUIDCreateString(kCFAllocatorDefault,uuid);

Friday, October 16, 2009 69


Dates

• NSDate contains time zone info


• You can compare two NSDate objects
or two timestamps

• UNIX Timestamp (1970)


• NSDate Timestamp (2001)

Friday, October 16, 2009 70


Syncing with CoreData
• Set modification date in -willSave
• Check -isUpdated and - 
changedValues

• Don’t update the mod date if it’s just


the mod date that changed.

• Set creation date, mod date, and GUID


in -awakeFromInsert

Friday, October 16, 2009 71


Networking
• Protocol choices:
• HTTP
• GameKit
• BEEP/BLIP-based protocol
• Roll your own (not recommended)
• Using Bonjour/ZeroConf
Friday, October 16, 2009 72

You have a few choices for your protocol.


If you’re communicating with a server, you can make yourself a web service API. Your sync
source is just wrapping code that makes NSURLRequests.
I made the unfortunate choice of using it locally over the network. Writing an HTTP server
that just has to talk with one other device isn’t too hard, but it was a really dumb
architectural decision. Routers like to screw with it, even when it’s on a non-standard port.
Some (Bad)
Syncing Code
from My App

Friday, October 16, 2009 73


ShoveBox Mac App

SBSyncEngine

SBSyncSource

SBIPhoneSyncSource SBLocalDBSyncSource

Friday, October 16, 2009 74


- (id) initWithLastSyncDate:(NSDate *)lastSync
sourceA:(NSObject<SBSyncSource> *)a
sourceB:(NSObject<SBSyncSource> *)b
operation:(SBSyncEngineOperation)newOperation;

- (IBAction) start:(id)sender
- (IBAction) cancel:(id)sender;

- (NSDate *) lastSyncDate;
- (NSString *) currentlySyncingObjectName;
- (SBSyncEngineOperation) operation;

- (NSObject<SBSyncSource> *) sourceA;
- (NSObject<SBSyncSource> *) sourceB;

- (NSObject<SBSyncEngineDelegate> *) delegate;
- (void) setDelegate:(NSObject<SBSyncEngineDelegate> *)theDelegate;

Friday, October 16, 2009 75


typedef enum SBSyncEngineOperation {
! SBSyncEngineOperationSync = 0, // Time-based sync A and B
! SBSyncEngineOperationMerge = 1, // Non-destructive merge between A and B
! SBSyncEngineOperationCopy = 2, // Replace B’s contents with A’s
} SBSyncEngineOperation;

Friday, October 16, 2009 76


@protocol SBSyncEngineDelegate
- (void) syncEngineFinishedSyncingSuccesfully:(SBSyncEngine *)syncEngine;

- (void) syncEngineDidCancel:(SBSyncEngine *)syncEngine;

- (void) syncEngine:(SBSyncEngine *)syncEngine


abortedWithError:(NSError *)err;

- (BOOL) syncEngine:(SBSyncEngine *)syncEngine


pausedWithRecoverableError:(NSError *)err;
// return YES to continue, NO to cancel

- (void) syncEngine:(SBSyncEngine *)syncEngine


syncedObjects:(NSUInteger)objects
ofTotal:(NSUInteger)total;

// return the index of the correct choice


- (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine
encounteredEntryConflictWithA:(NSDictionary *)aEntryInfo
b:(NSDictionary *)bEntryInfo;

- (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine


encounteredFolderConflictWithA:(NSDictionary *)aFolderInfo
b:(NSDictionary *)bFolderInfo;

- (NSUInteger) syncEngine:(SBSyncEngine *)syncEngine


encounteredSimpleEntityConflictWithKeyPath:(NSString *)keyPath
aValue:(id)aValue
bValue:(id)bValue;
@end
Friday, October 16, 2009 77
Questions/Discussion

Friday, October 16, 2009 78

You might also like