Professional Documents
Culture Documents
The Atom uses a new architecture, but with older technologies. Its the first in-order x86 from
Intel since the Pentium, back in 1993. All other Intel processors (since the P6) use an out-oforder architecture.
In-Order: Say what?
Zoom
To simplify, think of the processor as receiving the instructions one by one and putting them
in its pipeline before executing them. In an in-order architecture, the instructions are executed
in the order in which they arrive, whereas an out-of-order architecture is capable of changing
the order in the pipeline. The advantage is that losses can be limited. If, for example, you
have a simple calculation instruction, a memory access, then another simple calculation, an
in-order architecture will execute the three operations one after the other, whereas in OoO the
processor can execute the two calculations at the same time and then the memory access, with
an obvious time saving. Quite surprisingly, whereas in-order architectures generally use a
short pipeline, the Atom has a 16-stage pipeline, which can be a disadvantage in certain cases.
HyperThreading
HyperThreading is a technology that appeared with the Pentium 4. It can process two threads
simultaneously using the unused parts of the pipeline. While not as efficient as two true cores,
the technology can make the OS think that the CPU can process two threads simultaneously
and increase the computers overall performance. On the Atom with its long pipeline coupled
to an in-order architecture, HyperThreading is very effective, and the technology can
significantly increase performance without impacting the TDP. Intel claims an increase in
consumption of only 10%.
The processing core
Zoom
For the rest, the Atom is equipped with two ALUs (units capable of performing integer
calculations) and two FPUs (units dedicated to floating-point calculation and very important
for gaming, for example). The first ALU manages shift operations, and the second jumps. All
multiplication and addition operations, even in integers, are automatically sent to the FPUs.
The first FPU is simple and limited to addition, while the second manages SIMD and
multiply/divide operations. Note that the first branch is used in conjunction with the second
for 128-bit calculation (the two branches are in 64 bits).
Intel Has Optimized the Basic Instructions
If you look at the number of cycles necessary to execute instructions, you realize something:
Some instructions are fast and others are (very) slow. A mov or an add, for example, is
executed in one cycle, as on a Core 2 Duo, whereas a multiplication (imul) will take five
cycles, compared to only three on the Corearchitecture. Worse, a floating-point division in 32
bits, for example, takes 31 cycles compared to only 17 (or almost half as many) on a Core 2
Duo. In practice and Intel willingly admits this the Atom is optimized to execute the basic
instructions quickly, meaning that this processor short-changes performance with complex
instructions. This can be checked simply using Everest (for example), which includes a tool
for measuring the latencies of instructions.
Twitter
Intel has chosen a fairly out-of-the-ordinary organization for the Atom, but without sacrificing performance (which is
important with a CPU using an in-order architecture).
seems that this move to 8-transistor cells was made late in the game, when the design of the processor was fairly
advanced, which meant that the size of the cache had to be reduced to fit it in which explains the 24 kB for the data
cache. This unofficial explanation was advanced by AnandTech in their article introducing the Atom in April.
Zoom
Zoom
Zoom
Twitter
Power consumption is central to this Intel platform, and theyve made a lot of efforts in that department. Aside from
the chipset, which consumes a lot of power in comparison to the processor, the Atom itself has many attractive
functions.
C6 power state
In addition to the low voltage (1.05 V) CPU, the Atom also introduces a new standby mode, C6. As a reminder, the C
modes (0 to 6) are low-power states, and the higher the number, the less the CPU consumes. In C6 mode, the entire
processor is almost totally disabled. Only a cache memory of a few kB (10.5) is kept enabled to store the state of the
registers. In this mode, the L2 cache is emptied and disabled, the supply voltage falls to only 0.3 V, and only a small
part of the processor remains active, for wake-up purposes. The processor can go into C6 mode in approximately 100
microseconds, which is quick. In practice, Intel claims, C6 mode is used 90% of the time, which limits overall power
consumption (obviously, if you launch a program that requires a lot of CPU power or even watch a Flash video you
wont be in that mode).
We should point out, though, that the two chipsets to be used with the Atom N200s are power users: the Atom 230s
use a i945GC that consumes 22 W (4 W for the CPU) and the Atom N270s ship with a i945GSE that burns 5.5 W (2.4
W for the CPU).
In Practice
So is the Atom really low-power in practice? The processor is, yes. For the platform aimed at NetTop (low-cost desktop
computers), the answer is yes, but... Why the but? Because the chipset used uses a lot of power and the processor
is listed at a TDP of 4 W, compared to 2.4 W for the mobile versions. Our test motherboard consumes 59 W in standby,
and we reached 62 W under maximum load (with a 3.5" hard disk and a 1 GB DDR2 DIMM). Obviously, these values
are what we measured for the complete platform, not only the motherboard, and they dont take power-supply losses
into account (our test model has a yield of approximately 80%). Thats both a little and a lot its not much for a
desktop computer, of course, but its a lot in absolute terms. We should add that we recently tested a motherboard
based on a 1.5 GHz Via C7, and the configuration drew less power with the same components: 49 W at idle and 59 W
under load (always measured at the AC outlet).
Conclusion
8:30 PM - June 5, 2008 by Pierre Dandumont
Twitter
What conclusion should we draw about the Atom platform? We came away with a mixed impression. The processor
itself is a success its affordable, consumes very little power, and while its performance is weak, its sufficient for its
target market (low-cost PCs intended for Web use). In addition, HyperThreadingis a good feature and the platform is
reactive. But for us the disappointment is the associated chipsets. Intel offers only two choices, and theyre open to
criticism. The SCH Poulsbo seems efficient and autonomous, but its not viable in a standard PC due to its MID
orientation (no SATA, for example), whereas the i945GC and i945GSE chipsets are usable in PCs, but theyre
throwbacks they lack functions, their performance with 3D is disastrous (whereas more and more applications are
using it), and they consume significantly more power than the processor itself.
You get the feeling that Atom is only a trial balloon one thats a success from some points of view and a failure from
others. Will computer manufacturers and the general public go for it? Undoubtedly, and for two reasons the price
and marketing. The platform will make it possible to offer computers at a very low price, and for now Atom has a good
brand image. The publics reasoning might proceed something like this:
"An Eee PC 900 for $450 (good) with a Celeron (not good) at 900 MHz (not good)"
or
An Eee PC 901 for $450 (good) with an Atom (good) at 1.6 GHz (good)
In other words, the Atom version will appeal more to the general public, even if in practice the difference is likely to be
pretty slim.
A paradoxical platform: The processor is a success (even if its performance is weak in absolute terms), whereas the
associated chipsets are not worth their salt. Overall, the gains compared to older platforms remain slim, and we hope
that Intel will be offering chipsets that are better suited in the future.
Pros
The chipsets
A mismatched platform
Bonnell microarchitecture
Main article: Bonnell (microarchitecture)
Intel Atom processors are based on the Bonnell microarchitecture,[3][4] which can execute up to two instructions
per cycle. Like many other x86 microprocessors, it translates x86-instructions (CISC instructions) into simpler
internal operations (sometimes referred to as micro-ops, i.e., effectively RISC style instructions) prior to
execution. The majority of instructions produce one micro-op when translated, with around 4% of instructions
used in typical programs producing multiple micro-ops. The number of instructions that produce more than one
micro-op is significantly fewer than the P6 and NetBurst microarchitectures. In the Bonnell microarchitecture,
internal micro-ops can contain both a memory load and a memory store in connection with an ALU operation,
thus being more similar to the x86 level and more powerful than the micro-ops used in previous designs.[28] This
enables relatively good performance with only two integer ALUs, and without anyinstruction
reordering, speculative execution, or register renaming. The Bonnell microarchitecture therefore represents a
partial revival of the principles used in earlier Intel designs such as P5 and the i486, with the sole purpose of
enhancing the performance per watt ratio. However, Hyper-Threading is implemented in an easy (i.e., low power)
way to employ the whole pipeline efficiently by avoiding the typical single thread dependencies.[28]
Hyperthreading
Although existing operating system and application code should run correctly on a
processor that supports Intel HT Technology, some code modifications are recommended
to get the optimum benefit. These modifications are discussed in Chapter 7,
Multiple-Processor Management, Intel 64 and IA-32 Architectures Software
Developers Manual, Volume 3A.
The current program, task, or procedure executes a JMP or CALL instruction to a TSS descriptor
in the GDT.
The current program, task, or procedure executes a JMP or CALL instruction to a task-gate
descriptor in the
GDT or the current LDT.
TASK MANAGEMENT
and CALL instructions. The CPL of the current (old) task and the RPL of the segment selector for
the new task
must be less than or equal to the DPL of the TSS descriptor or task gate being referenced.
Exceptions,
interrupts (except for interrupts generated by the INT n instruction), and the IRET instruction are
permitted to
switch tasks regardless of the DPL of the destination task-gate or TSS descriptor. For interrupts
generated by
the INT n instruction, the DPL is checked.
3. Checks that the TSS descriptor of the new task is marked present and has a valid limit (greater
than or equal
to 67H).
4. Checks that the new task is available (call, jump, exception, or interrupt) or busy (IRET return).
5. Checks that the current (old) TSS, new TSS, and all segment descriptors used in the task switch
are paged into
system memory.
6. If the task switch was initiated with a JMP or IRET instruction, the processor clears the busy (B)
flag in the
current (old) tasks TSS descriptor; if initiated with a CALL instruction, an exception, or an
interrupt: the busy
(B) flag is left set. (See Table 7-2.)
7. If the task switch was initiated with an IRET instruction, the processor clears the NT flag in a
temporarily saved
image of the EFLAGS register; if initiated with a CALL or JMP instruction, an exception, or an
interrupt, the NT
flag is left unchanged in the saved EFLAGS image.
8. Saves the state of the current (old) task in the current tasks TSS. The processor finds the base
address of the
current TSS in the task register and then copies the states of the following registers into the
current TSS: all the
general-purpose registers, segment selectors from the segment registers, the temporarily saved
image of the
EFLAGS register, and the instruction pointer register (EIP).
9. If the task switch was initiated with a CALL instruction, an exception, or an interrupt, the
processor will set the
NT flag in the EFLAGS loaded from the new task. If initiated with an IRET instruction or JMP
instruction, the NT
flag will reflect the state of NT in the EFLAGS loaded from the new task (see Table 7-2).
10. If the task switch was initiated with a CALL instruction, JMP instruction, an exception, or an
interrupt, the
processor sets the busy (B) flag in the new tasks TSS descriptor; if initiated with an IRET
instruction, the busy
(B) flag is left set.
11. Loads the task register with the segment selector and descriptor for the new task's TSS.
12. The TSS state is loaded into the processor. This includes the LDTR register, the PDBR (control
register CR3), the
EFLAGS register, the EIP register, the general-purpose registers, and the segment selectors. A fault
during the
load of this state may corrupt architectural state.
13. The descriptors associated with the segment selectors are loaded and qualified. Any errors
associated with this
loading and qualification occur in the context of the new task and may corrupt architectural state.
NOTES
If all checks and saves have been carried out successfully, the processor commits to the task
switch. If an unrecoverable error occurs in steps 1 through 11, the processor does not complete
the
task switch and insures that the processor is returned to its state prior to the execution of the
instruction that initiated the task switch.
If an unrecoverable error occurs in step 12, architectural state may be corrupted, but an attempt
will be made to handle the error in the prior execution environment. If an unrecoverable error
occurs after the commit point (in step 13), the processor completes the task switch (without
performing additional access and segment availability checks) and generates the appropriate
Base A base alone represents an indirect offset to the operand. Since the
value in the base register can change, it can be used for dynamic storage of
variables and data structures.
created when a procedure is entered. Here, the EBP register is the best choice for
the base register, because it automatically selects the stack segment. This is a
compact encoding for this common function.