You are on page 1of 5

There are many good answer in this post.

I would like to add my experience also

To achieve low latency in java you have to take control of GC in java, there are
many ways to do that for eg pre-allocate objects(i.e use flyweight design pattern),
use primitive objects - trove is very good for that, all data structure are based on
primitive, Reuse object instance for eg create system wide dictionary to reduce
creating new objects, very good option when reading data from stream/socket/db

Try to use wait-free algo( which is bit difficult), lock free algo. You can find
tons of example for that

Use in-memory computing. Memory is cheap, you can have tera byte of
data in memory.

If you can master bit-wise algo then it gives very good performance.

Use mechnical sympathy - Refer lmax disruptor, excellent framework

Low latency is a function of many things, the two most important ones being:

network latency - i.e. the time taken on the network to transmit/receive messages.
processing latency - i.e. the time taken by your application to act on a
message/event.
So, if you are say writing an Order Matching system, the network latency would
represent how soon within your network were you able to receive the order matching
request. And processing latency would represent the time taken by your application to
match the Order against existing, open orders.
Multicast, UDP, reliable multicast, Kernel bypass (supported by Java 7, Informatica Ultra
Messaging, and many others) on Infiniband networks are some common technologies
used by all companies in this field.
Additionally, there are low latency programming frameworks like disruptor
(http://code.google.com/p/disruptor/) which implement design patterns for dealing with
low latency applications. What could kill you is having to write to a DB or log files as part
of your main workflow. You will have to come up with unique solutions that fulfill the
requirements of the problem you are trying to solve.
In languages like Java, implementing your app such that it creates (almost) zero
garbage becomes extremely important to latency. As Adamski says, having a knowledge
of Java memory model is extremely important. Understand different JVM
implementations, and their limitations. Typical Java design patterns around small object
creation are the first things that you will throw out of the window - one can never fix the
Java Garbage Collector enough to achieve low latency - the only thing that can be fixed
is the garbage.

Good luck!

I work for a financial company that produces low latency software for communication
directly with exchanges (for submitting trades and streaming prices). We currently
develop primarily in Java. Whilst the low latency side isn't an area I work in directly I
have a fair idea of the skillset required, which would include the following in my opinion:

Detailed knowledge of the Java memory model and techniques to avoid


unnecessary garbage collection (e.g. object pooling). Some of the techniques used
might typically be regarded as "anti-patterns" in a traditional OO-environment.

Detailed knowledge of TCP/IP and UDP multicast including utilities for debugging
and measuring latency (e.g. DTrace on Solaris).

Experience with profiling applications.

Knowledge of the java.nio package, experience developing NIO-based scalable


server applications, experience designing wire protocols. Also note that we typically
avoid using frameworks and external libraries (e.g. Google Protobuf), preferring to
write a lot of bespoke code.

Knowledge of FIX and commercial FIX libraries (e.g. Cameron FIX).


Unfortunately many of the skills can only be developed "on the job" as there's no
substitute for the experience gained implementing a price server or trading engine based
on a spec. from an exchange or vendor. However, it's also worth mentioning that our
company at least tend not to look for specific experience in this (or other) niche
areas, instead preferring to hire people with good analytical and problem solving skills.

Typically, work in low-latency environments means having an understanding of call


dependencies and how to reduce them to minimize the dependency chain. This includes
the use of data structures and libraries to store desired cacheable data as well as
refactoring existing resources to reduce interdependencies.

In addition to Martijn's comments I'd add:


1.

Warm up your JVM. Bytecode starts starts off being interpreted for
Hotspot and then gets compiled on the server after 10K observations.
Tiered Compilation can be a good stop gap.

2.

Classloading is a sequential process that involves IO to disk. Make sure


all the classes for your main transaction flows are loaded upfront and
that they never get evicted from the perm generation.

3.

Follow the "Single Writer Principle" to avoid contention and the


queueing effect implications of Little's Law, plus study Amdhal's Law for
what can be parallel and is it worth it.

4.

Model you business domain and ensure all your algorithms are O(1) or
at least O(log n). This is probably the biggest cause of performance
issues in my experience. Make sure you have performance tests to cover
the main cases.

5.

Low-latency in Java is not just limited to Java. You need to understand


the whole stack your code is executing on. This will involve OS tuning,
selecting appropriate hardware, tuning systems software and device
drivers for that hardware.

6.

Be realistic. If you need low-latency don't run on a hypervisor. Ensure


you have sufficient cores for all threads that need to be in the runnable
state.

7.

Cache misses are your biggest cost to performance. Use algorithms


that are cache friendly and set affinity to processor cores either with
taskset or numactl for a JVM or JNI for individual threads.

8.

Consider an alternative JVM like Zing from Azul with a pause-less


garbage collector.

9.

Most importantly get someone involved with experience. This will save
you so much time in the long run. Shameless plug :-)

Real-time and low-latency are distinctly separate subjects although often


related. Real-time is about being more predictable than fast. In my
experience the real-time JVMs, even the soft real-time ones, are slower than
the normal JVMs.
There are a bunch of things to be aware of yes. I'm in Crete at the moment
with limited net access so this will be (fairly) short. Also, I'm not a low-latency
expert, but several of my colleagues play one in real life :-).
1.

You need to appreciate Mechanical Sympathy (a term coined by Martin


Thompson). In other words you need to understand what your underlying
hardware is doing. Knowing how CPUs load cache lines, what their
read/write bandwidth is, speed of main memory and much, much more is
very important. Why? Because you'll need to reason how your Java
source code affects the OperatingSystem/Hardware via the runtime JVM.
For example, is the way your field variables are laid out in your source
code causing cache line evictions (costs you ~150 clock cycles),
hmmm... :-).

2.

Generally you want lock free algorithms and I/O. Even the most well
designed concurrent application (that uses locks) is at risk of blocking,
blocking in low latency is generally bad :-).

3.

Understand Object Allocation and Garbage Collection. This is a massive


topic, but basically you want to avoid GC pauses (often caused by the
Stop the World nature of various GC collections). Specialist GC collectors
like the Azul collector can in many cases solve this problem for you out of
the box, but for most people they need to understand how to tune the
Sun/Oracle GCs (CMS, G1, etc).

4.

The Hotspot JIT is freaking amazing. Learn about its optimizations, but
generally speaking all of the good OO techniques (encapsulation, small
methods, as much immutable data as possible) will allow JIT to optimize,
giving you the sorts of performance levels that well crafted C/C++ code
gives you.

5.

Overall system architecture. Be aware of the network, how machines


are co-located, if you're connected to the exchange via fiber etc etc.

6.

Be aware of the impact of logging. logging binary or using coded


output that you can parse off line is probably a good idea.

Overall I highly recommend going on Kirk Pepperdine's Java Performance


Tuning course [Disclaimer: I teach this course myself, so I'm biased]. You'll get
good coverage of the various aspects of the JVM and its impact on underlying
O/S and hardware.
PS: I'll try to revisit this later and tidy it up some.

The main difference with low latency timings is that

every micro-second counts. You will have an idea of most much each microsecond costs your business per year and how much time it is worth reducing each
micro-second.
you want to measure the highest 99% or even 99.99% latencies. (worst 1% or
0.01% respectly)
you want a fast clock which is often limited to one host, or even one socket. (You
can measure low latency between hosts with specialist hardware) For multimillisecond timings you can relatively easily measure between hosts (with just NTP
configured)
you want to minimise garbage, esp in your measurements.
it is quite likely you will need to develop application specific tools which are
embedded into the application and run in production. You can use profilers as a

start but most ultra low latency applications don't show anything useful in
commercial profilers (nor do they GC much, if at all when running)
You can have a read of my blog for general low latency, high performance testing
practices (some of these are nano-second based). Vanilla Java

You might also like