Cloud Ready Data Center Network DESIGN GUIDE

DESIGN GUIDE
CloUD REaDy Data CENtER NEtwoRk DESIGN GUIDE
Copyright 2011, Juniper Networks, Inc.
DESIGN GUIDE - Cloud Ready Data Center Network
Table of Contents
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 History of the Modern Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 application Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Server Platform Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Infrastructure Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 operational Models Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 types of Data Centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 transactional Production Data Center Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Content and Hosting Services Production Data Center Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 High-Performance Compute (HPC) Production Data Center Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Enterprise It Data Centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Small and Midsize Business It Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 the New Role of the Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Physical layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 top of Rack/Bottom of Rack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Virtual Chassis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 End of Row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Middle of Row . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Cloud Data Center Network Design Guidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Physical Network topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 Single tier topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 Multitier topologies (access-Core) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 access-Core Mesh Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Resiliency Design and Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 application Resiliency Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 application Resource Pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 Critical application Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 Server link Resiliency Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 Server link Resiliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Network Device Resiliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Virtual Machine Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Network Device Resiliency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Hot-Swappable Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Unified In-Service Software Upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Unified ISSU Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Redundant Switching Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Redundant Routing Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Network oS Resiliency and Reliability Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Routing and Forwarding on Separate Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Modular Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Single Code Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Graceful Routing Engine Switchover (GRES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Nonstop active Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Nonstop Bridging/Nonstop Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Network Resiliency Designs and Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 access-Core Inverse U loop-Free Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Multichassis link aggregation Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Redundant trunk Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Virtual Chassis at the Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 layer 3 Routing Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Multiple Spanning tree Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
MPlS at Core level for large Deployments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 agility and Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 logical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Virtual Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 logical Routers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Virtual Chassis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 VlaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Security Zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 MPlS VPNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Capacity Planning, Performance, and Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Modular Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Port Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Software Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Solution ImplementationSample Design Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 Scenario 1: Enterprise Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 Scenario 2: transactional Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 about Juniper Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Table of Figures
Figure 1. top of rack deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Figure 2. Virtual Chassis in a top of row layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Figure 3. Dedicated Virtual Chassis daisy-chained ring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Figure 4. Virtual Chassis braided ring cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Figure 5. Extended Virtual Chassis configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Figure 6. End of row deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Figure 7. Middle of row deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Figure 8. Single tier network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Figure 9. access-core hub and spoke network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16 Figure 10. access-core inverse U network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Figure 11. access-core mesh network topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 Figure 12. application resource pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 Figure 13. Critical application resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 Figure 14. Server link resiliency overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Figure 15. Separate control and forwarding planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 16. Resiliency with access-core inverse U . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Figure 17. Multichassis laG configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Figure 18. RtG configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Figure 19. Virtual Chassis core configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Figure 20. layer 3 configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Figure 21. MStP configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Figure 22. MlPS design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Figure 23. Network with logical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Figure 24. VPlS switching across data centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Figure 25. agility designs for the data centers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 26. traditional versus virtual appliance architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Figure 27. Simplified data center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Figure 28. Use Case: Enterprise data center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Figure 29. Use Case: transactional data center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Introduction
Data centers have evolved over the past several decades from single point, concentrated processing centers to dynamic, highly distributed, rapidly changing, virtualized centers that provide myriad services to highly distributed, global user populations. a powerful new cloud computing paradigm has emerged in which users request and receive information and services dynamically over networks from an abstracted set of resources. the resources are somewhere out there in the cloud. Users dont care where or how the resources are provided; they only care that their applications, data, and content are available when needed, automatically, and at the desired level of quality and security. as demands for cloud computing services grow and change at an ever increasing pace, it becomes critical for data center planners to address both current and evolving needs, and to choose a flexible, dynamic cloud data center design that can effectively meet the complex challenges of the global information age. to aid in this process, it will be helpful to look back at how applications, computing platforms, infrastructure, and operations have changed since modern data centers were first introduced, and what those changes mean for todays data center designers. we will find a common themethat the network is the key to the new data center and the most critical element to consider in data center design.
Scope
at the beginning of this guide, we review the history of modern data centers. the remainder of this guide introduces the design principles, physical layouts, network topologies, and protocols that provide the foundation of the cloudready data center network. we present a comprehensive set of guidelines for data center design, and consider how the guidelines can be applied in practice to two different data center typesan enterprise data center and a transactional data center. Finally, we describe how to use Juniper products and solutions to design data centers that fully realize the promise of cloud computing. this Design Guide is intended for the following personnel: Customers in the enterprise and public sector Service providers Juniper partners It and network industry analysts Individuals in sales, network design, system integration, technical support, product development, management, and marketing who have an interest in data center design
History of the Modern Data Center

Application Evolution
Modern data centers began in the 1960s as central locations to house huge mainframe computers and the associated storage devices and other peripherals. a typical data center consisted of large, air conditioned rooms of costly equipment with highly trained staff assigned to keep the equipment running 24x7x365. Each compute job was handled by a single mainframe computer within a single data center. Users accessed the computing resources via timeshare through dumb terminals operating at very low bit rates over telephone lines. the data center was all about the computer. this remained true even when minicomputers were introduced in the 1970s and some data centers were downsized to serve individual departments. Client/server systems were introduced in the 1980s as the first stage in distributed computing. application processing was distributed between the server and client with communication between them by way of proprietary protocols. Each company had its own application server and associated application clients (either running client software or operating as a client terminal). the server provided the application and associated database, while the client provided the presentation layer and local data processing. the network began to play an important role in client/server communications; however, within the data center itself, processing for a single application was still done on a single machine, and networking within the data center was of secondary concern.
the 1990s saw the explosion of Internet communications and the full arrival of the global information age. ordersof-magnitude increases in bandwidth, ubiquitous access, powerful routing protocols, and dedicated hardware allowed applications to break out of the single computer or single server model. applications started to function in multiple tiers, in which each tier provided specialized application services. the application tier performed application processing, drawing from the storage tier, while the presentation tier supported user interaction through a web interface. application development relied on standard languages (Java), tiers communicated with each other using standard protocols (tCP/IP), and standard presentation protocols (HttP) provided support across many types of platforms and web browsers. Backend processing became modular, with greatly increased communications within laNs and waNs. application evolution has since seen the rise of service-oriented architectures (Soas), in which each application client handles a piece of information collected from multiple systems, and multiple servers may deliver a single service. Each application is delivered as a menu of services, and new applications are constructed from existing applications in a process known as mashup. For example, a credit transaction application may be constructed from a set of applicationsone that handles the user interaction, another that performs a credit check, and another that processes the actual transaction. Each application may have an independent web presence that resides in multiple locations. with Soas, large, custom, single purpose applications have been replaced by smaller, specialized, generic functions that provide services to multiple applications. For example, an application service provider may offer a single credit check application to other application providers. and full applications, which consist of multiple single purpose applications, are now offered as a service to users. For example, a company no longer needs to purchase equipment and software to provide customer relationship management (CRM) functionality. Instead, it can purchase CRM services from a company such as Salesforce that in turn depends on other specialized application functions. all of these developments result in complete dependence of applications on the network: within the data center, between data centers, and between data centers and users. as application demand increases, more nodes must be added to the compute cycle and all nodes must be interconnected and remain synchronized. application users now have high expectations of availability, performance, and overall user experience, and support for quality of service (QoS) and service-level agreements (Slas) are a must. the network must be able to offer predictable performance and quality, while handling increasingly complex communication flows and patterns.
Server Platform Evolution

From the 1960s to the early 2000s, the story of platform evolution was one of staggering increases in compute power coupled with a continual reduction in size, power requirements, and cost. It is difficult to overstate the impact of these changes on all aspects of business, commerce, government, and technology. However, throughout this period, direct connection was maintained between logical systems and physical computer systemsan individual platform always corresponded to a single type of logical entity. It was possible to partition machines (disk or operating system partitions), but the partitions were always of the same type. this changed in the early 2000s with the introduction of virtual machine technology. It became possible to take a single physical platform and slice it into multiple virtual machines, each of which runs its own operating system and is logically separate from other virtual machines on the same platform. It is also possible to take multiple physical platforms and combine them into a single logical platform to increase processing power and other capabilities. with x86-based virtual machines, data centers have been able to significantly reduce hardware costs and free up space for additional capacity. the decoupling of logical and physical systems has had a major impact on the design and delivery of applications and services within the data center. It is no longer possible to design security and understand potential failure points based on the physical cable connections between machines; it is necessary to understand the relationship between platforms and services. It has become critically important to understand how the network links physical systems and how the physical systems operate relative to the logical systems to which they are mapped. Virtual systems affect the size of layer 2 domains, cross-connecting servers and network systems, security, load balancing, high availability, and QoS for communication between nodes.
Infrastructure Evolution
In addition to individual platform evolution and the introduction of machine virtualization, there has been an evolution in how physical servers and network elements are designed and deployed. on the server side, the advent of virtualization and low cost x86-based servers has led to the introduction of blade servers and server clusters. a blade server is a chassis that houses individual blade machines that are connected by an internal Ethernet switch. Server clusters use a back channel Infiniband connection that allows separate physical servers to operate as a single node. to support these configurations, the network must extend itself at the control level to communicate with the proprietary networks within the blade server or cluster. on the network side, converged I/o and Converged Enhanced Ethernet (CEE) are driving changes in how servers and storage systems connect within the data center. Converged I/o allows a server to have fewer physical interfaces, with each interface supporting multiple logical interfaces. For example, within a blade server, a few physical interfaces may support bridge connections, laN connections, and compute connections for a variety of protocols. the evolving CEE standards effort is focused on adding lossless transport capabilities to Ethernet communications with the goal of extending converged I/o to server/storage links. the physical communications infrastructure is also undergoing significant changes. as the current standard of 1/10 Gbps for communication within the data center grows to 40/100 Gbps over the decade, data centers will need to rely more and more on fiber capacity and will need to develop cost-effective cabling plans.
Operational Models Evolution

Driven by user demand and technology, applications, services, and the associated production and delivery mechanisms have grown in ways that would have been inconceivable decades ago. But much of this development has been at the cost of ever increasing complexity. as traditional boundaries erode within servers, among servers, between servers and storage systems, and across data centers, the job of managing the various components becomes exponentially more difficult. Business processes must be mapped to a complex and dynamic infrastructure. Security and accountability become major concerns, especially when sensitive information such as personal or financial data must be transmitted over inherently unsecure physical links through a menu of specialized application services offered by multiple application providers. Initially, information systems development outpaced business process improvements, and essential governance and control functions were lacking. accountability has now improved, thanks to changes such as those introduced with government regulation; however, many operational challenges remain. Information now flows from many directions, and is touched by many parties and systems. Data center operators must account for their own systems and also make sure that third-party application services are reliable and accountable. For cloud computing to succeed, it must be possible to bring the management of applications, platforms, infrastructure, and operations under a common orchestration layer. the orchestration systems must support the myriad components that comprise the cloud solution and provide common management tools that can assure reliable and secure service production and delivery. the systems must meet existing and emerging standards and regulatory and compliance requirements. overall, the orchestration layer must provide a robust framework to support the continually evolving cloud network.
Types of Data Centers

Not all data centers are the same. there are two general categoriesproduction data centers and It data centers. Production data centers are directly linked to revenue generation, whereas It data centers provide support functions for business operations. In addition, data center requirements and design can vary widely according to use, size, and desired results. Some data centers are designed for the lowest possible latency and highest possible availability, while others require comprehensive attention to QoS, scale, and high availability. and, some limit feature support to control costs. this section provides examples of the most common data center types.
Transactional Production Data Center Network

High-speed computing is critical to the success and profitability of todays financial institutions. at high scale, every nanosecond of latency accounts for either a profit or loss. therefore, businesses such as financial services cannot afford any performance or security risks. typically, a financial services network is extremely complex and employs numerous devices and services to support a high-performance computing infrastructure. the design of the production data center must address specific requirements to guarantee predictability and low latency associated with trading platforms, algorithmic trading applications, and market data distribution systems.
Content and Hosting Services Production Data Center Network

with the emergence of cloud computing and with more services being migrated to an IP-based delivery model, production data centers are playing a more critical role for hosting content and services. For example, hosting providers are now offering cloud-based infrastructure as a service. a broad array of online retail options have emerged that position the Internet as a key business enabler. In addition, pure play cloud providers offer extensive development and delivery platforms. these business models impose a strict set of data center requirements. High availability is required to keep the online services available 24x7x375, and the ability to deliver high volume traffic is required for the customer experience. From the providers perspective, functions in the network are needed to support new business models. For example, the data center must support changing workload distribution and deliver new workloads to the end user. the security infrastructure must support virtualization technology and be granular enough to handle specific applications and users. the network must also support applications that run across multiple sites while retaining a consistent security policy across the whole environment.
High-Performance Compute (HPC) Production Data Center Network

Scientific innovation in emerging fields such as nanotechnology, biotechnology, genetics, and seismic modeling are driving production data center requirements. with the aid of general-purpose hardware, many organizations are finding it more cost-effective to leverage grid computing (High Performance Compute Clusters or HPC/C) for their intensive compute tasks. this technology is based primarily on grouping multiple CPUs together, distributing tasks among them, and collectively completing large calculations. In the network, 10GbE offers distinct benefits, not just for performance but also in terms of the low cost-to-bandwidth ratio.
Enterprise IT Data Centers

Enterprise data centers are found across a wide variety of industries, including healthcare, retail, manufacturing, education, and energy and utilities. the enterprise data center has traditionally been designated as an It cost center, in which the primary objective is to be a business enabler, providing access to business applications, resources, and services (such as oracle, CRM, ERP, and others) for employees and other network users. Major requirements include high availability and low latency performance to enhance productivity and the user experience. Enterprise data center designers typically look for leading-edge innovative solutions, such as server virtualization, I/o convergence, and (MPlS)/virtual private laN service (VPlS) for multisite connectivity, and the data center network must support these technologies.
Small and Midsize Business IT Data Center

with todays advanced, high priced networking technologies, most small and midsize businesses (SMBs) face serious challenges in being able to afford the latest It data center infrastructure technologies while still trying to remain competitive and profitable. Challenges include the operational overhead associated with implementation and adoption of new technologies. Scalability and security are also major concerns. SMBs need a reliable option for build-out of the data center network that is cost-effective and easy to deploy and manage. Juniper recognizes the importance of these SMB challenges by offering a low cost, less invasive approach for deploying a cloud-ready common switching and routing infrastructure based on fewer required devices.
The New Role of the Network

the growing importance of the network has been a central theme in the history of modern computing. Now with cloud computing, the network has become paramount. Everything now runs over a network, within and across systems in the data center, between data centers, and between data centers and users. within the data center, the network drives and enables virtualization, consolidation, and standardization. Globally, the network serves as the delivery and access vehicle. without the network, there is no cloud.
Design Considerations
In the face of exponentially increasing complexity in compute and networking systems, it becomes critical to design data centers that reduce complexity. Juniper addresses this concern with an approach and products that greatly simplify data center design, deployment, and operation. the Juniper strategy optimizes designs in the following dimensions, each of which enables data centers to meet important application delivery objectives: 1. Simplify. Simplifying the data center network means minimizing the number of network elements required to achieve a particular design, thus reducing both capital and operating costs. Simplifying also means streamlining data center network operations with consistently implemented software and controls. 2. Share. Sharing the data center network means intelligently (and in many cases dynamically) partitioning the infrastructure to support diverse applications and user groups, and interconnecting large pools of resources with maximum agility. In many cases, this involves powerful virtualization technologies that allow multiple logical operations to be performed on individual physical entities (such as switches, routers, and appliances). 3. Secure. Securing the data center network extends protection to support the rich, distributed architectures that many applications currently use. this requires a robust, multidimensional approach that enhances and extends traditional perimeter defenses. Increasing the granularity and agility of security policies enables trusted sharing of incoming information and resident data within the data center, while complementing the functions embedded in operating systems and applications. 4. Automate. automating means the ability to capture the key steps involved in performing management, operational, and application tasks, and embedding task execution in software that adds intelligence to the overall data center operation. tasks can include synchronizing configurations among multiple disparate elements, starting and stopping critical operations under various conditions, and diagnosing or profiling operations on the dimensions that are important for managers to observe. with this high-level design framework in mind, we can now discuss the individual functional components of the cloud data center and their associated requirements and enabling technologies.
Physical Layout
Planning the physical data center layout is an important first step in designing the data center. the data center is usually divided into multiple physical segments that are commonly referred to as segments, zones, cells, or pods. Each segment consists of rows or racks containing equipment that provides compute resources, data storage, networking, and other services. In this section, we consider various physical layout options. Major factors to consider include cabling requirements, cable length restrictions, power and cooling requirements, operations, and management. after the basic segment is specified, the same physical layout can be replicated across all segments of the data center or in multiple data centers. this modular design approach improves the scalability of the deployment, while reducing complexity and enabling efficient management and operations. the physical layout of networking devices in the data center must balance the need for efficiency in equipment deployment with restrictions on cable lengths and other physical considerations. there are trade-offs to consider between deployments in which network devices are consolidated in a single rack versus deployments in which devices are distributed across multiple racks. adopting an efficient solution at the rack and row levels ensures efficiency of the overall design as racks and rows are replicated throughout the data center. this section considers the following data center layout options: top of rack/bottom of rack End of row Middle of row
Top of Rack/Bottom of Rack

In a top of rack/bottom of rack deployment, network devices are deployed in each server rack (as shown for top of rack in Figure 1). a single device (or pair of devices for redundancy at the device level) provides switching for all of the servers in the same rack. to allow sufficient space for servers, the devices in the rack should be limited to a 1 U or 2 U form factor. the Juniper Networks EX4200 Ethernet Switch and QFX3500 Switch supports top of rack/bottom of rack deployments.
Network Devices EX4200/ QFX3500
Compute Storage Devices
Figure 1. Top of rack deployment

this layout places high-performance devices within the server rack in a row of servers in the data center. with devices in close proximity, cable run lengths are minimized. Cable lengths can be short enough to accommodate 1GbE, 10GbE, and future 40GbE connections. there is also potential for significant power savings for 10GbE connections when the cable lengths are short enough to allow the use of copper, which operates at one-third the power of longer run fiber cables.
with top of rack/bottom of rack layouts, it is easy to provide switching redundancy on a per rack basis. However, note that each legacy device must be managed individually, which can complicate operations and add expense since multiple discreet 24- or 48-port devices are required to meet connectivity needs. Both top of rack and bottom of rack deployments provide the same advantages with respect to cabling and switching redundancy. top of rack deployments provide more convenient access to the network devices, while bottom of rack deployments can be more efficient from an airflow and power perspective, because cool air from under floor HVaC systems reaches the network devices in the rack before continuing to flow upward. top of rack/bottom of rack deployments have some disadvantages, however. Because the devices serve only the servers in a single rack, uplinks are required for connection between the servers in adjacent racks, and the resulting increase in latency may affect overall performance. agility is limited because modest increases in server deployment must be matched by the addition of new devices. Finally, because each device manages only a small number of servers, more devices are typically required than would otherwise be needed to support the server population. Juniper has developed a solution that delivers the significant benefits of top of rack/bottom of rack deployments while addressing the above mentioned issues. the solution, Virtual Chassis, is described in the next section.
Virtual Chassis
Junipers approach of virtualizing network devices using Virtual Chassis delivers all of the benefits of top of rack/ bottom of rack deployments while also reducing management complexity, providing efficient forwarding paths for server-to-server traffic, and reducing the number of uplink requirements. a single Virtual Chassis supports up to 10 devices using cross-connects. From a management perspective, multiple devices become one logical device. this approach simplifies management by reducing the number of logically managed devices, and it offers agile options for the number and deployment of uplinks. It also allows servers to support network interface card (NIC) teaming using link aggregation groups (laGs) with multiple members of the same Virtual Chassis configuration. this increases the total server network bandwidth, while also providing up to 9:1 server link redundancy. Figure 2 illustrates a Virtual Chassis using two devices in a top of rack deployment.
Uplink
Uplink
RE0 RE1
64 Gigabit Ethernet Dedicated Virtual Chassis 10-Gigabit Ethernet Uplinks RE0 RE1 Virtual Chassis Master Virtual Chassis Backup
Figure 2. Virtual Chassis in a top of row layout
10
Juniper supports flexible placement of EX4200 devices as part of a Virtual Chassis configuration. Possible deployments include members in a single rack, across several racks, in the same wiring closet, or spanning wiring closets across floors, buildings, and facilities. when interconnecting devices through dedicated Virtual Chassis ports, the physical distance between two directly connected devices may not exceed 5 meters, which is the maximum Virtual Chassis port cable length. a Virtual Chassis configuration can be extended by using uplink ports configured as Virtual Chassis ports to allow a greater distance between two directly connected member devices. there are three cabling methods for interconnecting devices in a Virtual Chassis configurationdaisy-chained ring, braided ring, and extended Virtual Chassis configuration, as described in the following subsections. we recommend that devices in a Virtual Chassis configuration be connected in a ring topology for resiliency and speed. a ring configuration provides up to 128 Gbps of bandwidth between member devices. Daisy-Chained Ring In the daisy-chained ring configuration, each device in the Virtual Chassis configuration is connected to the device immediately adjacent to it. Members at the end of the Virtual Chassis configuration are connected to each other to complete the ring topology. Connections between devices can use either Virtual Chassis port on the back of a device (for example, VCP 0 to VCP 0 or VCP 0 to VCP 1). the daisy-chained ring configuration provides a simple and intuitive method for interconnecting devices. the maximum height or breadth of the Virtual Chassis is 5 meters.
5m
Figure 3. Dedicated Virtual Chassis daisy-chained ring

Braided Ring In the braided ring cabling configuration, alternating devices in the Virtual Chassis configuration are connected to each other. the two device pairs at each end of the Virtual Chassis configuration are directly connected to each other to complete the ring topology. Connections between devices can use either Virtual Chassis port on the back of a device. the braided ring configuration extends the Virtual Chassis height or breadth to 22.5 meters.
22.5m
Figure 4. Virtual Chassis braided ring cabling
11
Extended Virtual Chassis Configuration the extended Virtual Chassis configuration allows the interconnection of individual Virtual Chassis members or dedicated Virtual Chassis configurations across distances of up to 40 km with redundant fiber links. this configuration is used when deploying a Virtual Chassis configuration across wiring closets, data center racks, data center rows, or facilities. In this configuration, optional EX-UM-2XFP or EX-UM-4SFP uplink modules or fixed small form-factor pluggable transceiver (SFP) base ports in the EX4200-24F are used to interconnect the members of the Virtual Chassis. Multiple uplinks can be used for additional bandwidth and link redundancy. Note: Beginning with Juniper Networks Junos operating system 9.3, the 24 fixed Gigabit Ethernet SFP base ports in the EX4200-24F device can be configured as Virtual Chassis ports to extend Virtual Chassis configurations.
Up to 50 Km Dedicated Virtual Chassis

EX4200 Series
01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
Gigabit Ethernet or 10-Gigabit Ethernet Virtual Chassis Extension Virtual Chassis Location #1
48PoE
44 45 46 47 01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Virtual Chassis Location #2

EX4200 Series
40 41 42 43
48PoE
44 45 46 47
EX4200 Series
01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
48PoE
44 45 46 47 01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
EX4200 Series
40 41 42 43
48PoE
44 45 46 47
EX4200 Series
01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
48PoE
44 45 46 47 01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
EX4200 Series
40 41 42 43
48PoE
44 45 46 47
EX4200 Series
01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
48PoE
44 45 46 47 01 23 45 67 89 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
EX4200 Series
40 41 42 43
48PoE
44 45 46 47
Gigabit Ethernet or 10-Gigabit Ethernet Virtual Chassis Extension
Figure 5. Extended Virtual Chassis configuration

End of Row
In an end of row configuration, devices are deployed in a network device-only rack at the end of a row to support all of the servers in the row. In this configuration, which is common in existing data centers with existing cabling, high-density devices are placed at the end of a row of servers. End of row configurations can support larger form factor devices than top of rack/bottom of rack configurations. they also require fewer uplinks and simplify the network topology. Because they require cabling over longer distances than top of row/bottom of row configurations, they are best for deployments that involve 1GbE connections and relatively few servers. the EX4200 Ethernet Switch and Juniper Networks EX8200 line of Ethernet switches support end of row deployments.
Network Devices EX8200
Figure 6. End of row deployment
12
traditional modular chassis devices have commonly been used in end of row deployments, where cable lengths are relatively long between servers and network devices. Cable lengths may exceed the length limits for 10GbE/40GbE connections, so careful planning is required to accommodate high-speed network connectivity. Device port utilization is suboptimal with traditional chassis-based devices, and most consume a great deal of power and cooling, even when not fully configured or utilized. In addition, these large chassis-based devices may take up a great deal of valuable data center space.
Middle of Row
a middle of row deployment is exactly like an end of row deployment, except that the devices are deployed in the middle of the row instead of at the end. this configuration provides some advantages over end of row deployment, such as the ability to reduce cable lengths to support 10GbE/40GbE server connections. High-density, large form-factor devices are supported, fewer uplinks are required in comparison with top of row deployments, and a simplified network topology can be adopted. the EX4200 line and EX8200 line support middle of row deployments.
Network Devices
EX8200
Figure 7. Middle of row deployment

you can configure a middle of row network device rack so that devices with cabling limitations are installed in the racks that are closest to the device rack. while this option is not as flexible as the top of row deployment, it supports greater scalability and agility than the end of row deployment.
Cloud Data Center Network Design Guidance

Table 1. Juniper Products for Top of Rack, End of Row, and Middle of Row Deployments
LAyOuT BANDwIDTH EX8200 EX4200 QFX3500
top of rack/ Bottom of rack
1G 10G MIXES 1G X X X X X X
X X
End of row
10G MIXES 1G
Middle of row
10G MIXES
13
table 1 lists Juniper Networks products and shows how they support the physical layouts that we have discussed. Each product has been designed for flexibility and easy integration into the data center. EX8200 line of Ethernet Switches delivers the performance, scalability, and carrier-class reliability required for todays high-density enterprise data center and campus aggregation and core environments, as well as highperformance service provider interconnects. EX8200 Ethernet line cards are specifically designed to optimize enterprise applications. EX4200 Ethernet Switch combines the high availability and carrier-class reliability of modular systems with the economics and flexibility of stackable platforms, delivering a high-performance, scalable solution for data center, campus, and branch office environments. offering a full suite of l2 and l3 switching capabilities as part of the base software, the EX4200 satisfies a variety of high-performance applications, including branch, campus, and data center access deployments as well as GbE aggregation deployments. the high-performance Juniper Networks QFX3500 Switch addresses a wide range of deployment scenarios, which include traditional data centers, virtualized data centers, high-performance computing, network-attached storage, converged server I/o, and cloud computing. Featuring 48 dual-mode small form-factor pluggable transceiver (SFP+/SFP) ports and four quad small form-factor pluggable plus (QSFP+) ports in a 1 U form factor, the QFX3500 Switch delivers feature rich layer 2 and layer 3 connectivity to networked devices such as rack servers, blade servers, storage systems, and other switches in highly demanding, high-performance data center environments. For converged server edge access environments, the QFX3500 is also a standards-based Fibre Channel over Ethernet (FCoE) transit Switch and FCoE to Fibre Channel (FCoE-FC) Gateway, enabling customers to protect their investments in existing data center aggregation and Fibre Channel storage area network (SaN) infrastructures.
Physical Network Topologies

after addressing the physical layout, the next stage in data center design is to consider the topologies that will connect the network devices. the decision about topology involves issues related to cabling (with associated distance limitations), latency, network path resiliency to avoid single points of failure, and use of link management protocols for resiliency (with loop detection and prevention, if needed). this section considers four types of physical network topologies: Single tier topology access-core hub and spoke topology access-core inverse U topology access-core mesh design topology
Single Tier Topology

In a single tier network topology, each server pool connects directly to each logical switch in a single tier, creating a complete connection mesh. this simple design has low overhead and is highly efficient. Because there is a single switching tier, loops cannot occur, and all traffic forwarding decisions are highly optimized by way of internal mechanisms. traffic flow is controlled by configuration changes on the servers and devices. No special resiliency protocols are required. Each device in the single tier topology must support l2 and l3 functions as well as virtualization features such as VlaNs and virtual routers for logical separation. this approach is highly agile, because resources can move while retaining connection to the same devices. the single tier topology supports easy integration with services and edge connectivity, providing consistency, low latency, and simplified operations. In principle, any single device (or device pair for resiliency), such as the EX8200 line of Ethernet switches that provides complete network connectivity, can operate effectively in a single tier design. with currently available technologies and products, however, the single tier topology may not scale to meet the requirements of todays data centers.
14
Figure 8 shows a single tier topology that has been architected for resiliency. Each device has internal resiliency capabilities, and the devices are connected through multiple redundant links.
High Resiliency Switch/Router
EX8200 Virtual Chassis
Compute
IP Storage
Figure 8. Single tier network topology

Multitier Topologies (Access-Core)
Multitier designs are able to meet the scalability needs of large and expanding data centers. Each multitier design includes an access tier that provides the connection to server/storage pools, and a core tier that provides the switching infrastructure for access devices and the connection to the external network. Each access-core configuration becomes a replicable pod within the data center, providing connectivity and resiliency for the servers and storage that it services. Using Junipers simplified design approach, the data center capacity can be built out by adding as many pods as needed to support the required capacity and services. to make the multitier design resilient, each device is connected to multiple devices in the other tier. while this increases network resiliency, it also introduces loops in the network. link management protocols are required to provide logical loop-free flow for traffic forwarding. this section describes the physical multitier topologies; the associated link management protocols are discussed in the Resiliency Design and Protocols section below. Access-Core Hub and Spoke the access-core hub and spoke design addresses the scalability concerns associated with a single tier topology while retaining the same basic structure. this design includes pairs of access devices and a pair of core devices supporting server/storage pools. Each server/storage pool connects to both access devices in the pair to provide resiliency for the server-to-access device link. (See Server link Resiliency Design below for a description of server link resiliency options.) In the northbound direction, each access device connects to both core devices, and these are linked to each other. there is no single point of failure at the access or core tier, because if an access or core device fails, traffic from the server can still reach the other device. By adding additional access device pairs with similar uplink connections to the core devices, this design can provide network connections for a greater number of compute/storage resources. the access-core hub and spoke design effectively addresses the scale limitations of the single tier design, but at the cost of greater complexity. Because each access device connects to each core device and the core devices are linked, loops can occur within the access-core tiers. In this context, link management protocols are required for resiliency and to provide loop detection and prevention. See Resiliency Design and Protocols for a discussion of traditional and current Juniper approaches to loop detection and prevention.
15
Figure 9 shows the access-core hub and spoke design. the EX8200, EX4200, and QFX3500 can serve as access devices, and the EX8200 line, Juniper Networks MX480 and MX960 3D Universal Edge Routers can serve as core devices. this design is very commonly deployed, as it produces highly resilient and scalable networks.
High Resiliency Switch/Router
Core tier EX Series/ MX Series
Access tier EX Series/ QFX3500
Compute
IP Storage
Figure 9. Access-core hub and spoke network topology

Access-Core Inverse u Design Because the access-core hub and spoke design relies on loops to ensure resiliency, it requires the use of link management protocols for loop prevention and detection, which introduces complexity and increases latency in the network. to address this concern, the access-core inverse U design modifies the access-core hub and spoke layout to add resiliency with a loop-free design. In this topology, a connected pair of core devices serves a pair of access devices, which jointly provide network access to a pair of server/storage pools. there are two core devices (left and right) and two access devices (left and right). Each server/storage pool connects to both the left and right access devices. the left access device is linked to the left core device and the right access device is linked to the right core device. Finally, the two core devices are connected with redundant links for link resiliency. the key to this loop-free design is that each access device is connected to a different device in the core pair. the core devices are connected to each other, but the devices in each access pair are not connected to each other. Resiliency works through the links between the core devices. loops are avoided, because traffic that travels from an access device to the core tier (or vice versa) cannot directly return to the same access device. Figure 10 shows the inverse U design. the EX4200 and EX8200 switches can serve as core devices. this design provides network resiliency in a loop-free environment and results in increased agility, as VlaNs can be expanded anywhere in the data center network without the complexity of spanning tree. this design takes advantage of recent improvements in server link resiliency and simplifies network design. Efficient load balancing can be achieved by configuration of active standby links on server network connections.
16
EX Series/ MX Series
EX Series/ QFX3500
Compute
IP Storage
Figure 10. Access-core inverse u network topology

Access-Core Mesh Design
the access-core mesh design provides another alternative to the access-core hub and spoke design and is ideal for layer 3 routing at the access tier. In the other access-core topologies, the access devices operate at l2, and the l2/l3 boundary is at the core tier. Because there is no routing at the access layer, link management protocols are supported using l2 technology. with a link management protocol such as Spanning tree Protocol (StP), some links are blocked (unused) unless a failure occurs, and bandwidth is underutilized. the access-core mesh design solves this problem by leveraging layer 3 routing support at the access tier. the physical layout is the same as in the access-core inverse U design; however, the access device pair is interconnected using a layer 2 link. the uplinks from access devices to core devices are l3 interfaces, and l3 routing protocols provide the active/active link resiliency. the details of this approach are discussed in layer 3 Routing Protocols below. Figure 12 shows the access-core mesh design with the l2/l3 boundary at the access tier. the QFX3500, EX4200, and EX8200 switches can serve as access devices, and the EX8200 and EX4500 can serve as core devices. this design avoids the complexity of spanning tree by leveraging the full routing capabilities of Juniper Networks access devices. Network bandwidth utilization is increasing rapidly, and effective load balancing is an important tool for increasing link utilization and thereby making the best use of existing bandwidth. By including routing at the access tier, the data center network can increase network resiliency through faster convergence and enable load balancing of traffic between the access and core tiers using equal-cost multipath (ECMP).
17
Figure 11 includes a partial access-core mesh configuration on the left and a full mesh configuration on the right.
Layer 3 Layer 2
Layer 3 Layer 2
Compute
IP Storage
Compute
IP Storage
Figure 11. Access-core mesh network topology

keeping in mind the basic topologies described in this section, we can now turn to a discussion of the associated resiliency designs and protocols.
Resiliency Design and Protocols

with the convergence of high demand services onto IP infrastructures, network outages of any kind are no longer acceptable. Even relatively small packet losses can have a negative impact on users perceptions of service delivery, while a major node, link, or interface failure can have serious consequences for the provider. the data center design must minimize network failures whenever possible, and minimize the effects of failures that do occur. as virtual data centers and cloud computing infrastructures evolve, they often require distribution of management controls to multiple distributed sites to share responsibilities among distributed teams, extend controls to new and distant locations, and support high availability and disaster recovery. the data center design must accommodate distributed platforms and components, so connectivity is maintained regardless of location, and access to control information is available despite changes in availability and performance. In addition, to protect an enterprises competitive edge, business applications must be highly available, and productivity must not suffer when failures occur. when a disaster takes place, the organization must recover with minimal disruption, bringing backup business applications online again quickly and ensuring that the associated user data is protected and available. the overall objective of resiliency design is to eliminate single points of failure. Because failures can occur at any level (application, server, network device, network oS, and physical) the overall resiliency design must address resiliency requirements at each level. we have already discussed how resiliency designs can be incorporated into physical network topologies. In this section, we discuss how resiliency options are layered on top of the physical topology.
Application Resiliency Design

the goal of application resiliency is to maintain application availability in the event of failure or disruption at any level. this section considers application resiliency using application resource pools and protected critical application resources.
Application Resource Pools

In this approach, multiple application instances are grouped together into resource pools that are distributed across the network. Resource pools can be multitiered, involving web servers, application software, and databases. During normal operation, users may access any of the application instances, subject to load balancing or other types of coordination. Because each application is installed on multiple systems in multiple locations, access to the application is maintained when any single application resource or associated connection fails.
18
application resource pools are an effective resiliency solution; however, they may require synchronous state and data replication among the nodes in the resource pool to ensure that the application instances remain synchronized. when designing such a solution, it is important to plan for any synchronous coordination and load balancing, as well as to consider the associated performance, connectivity, and latency effects.
DC Network Users Application Resource Pool
Figure 12. Application resource pools

Critical Application Resources
Some applications have critical application resources that are not feasible or desirable to replicate. these may include high-end end servers that are too costly to replicate, or mission critical resources that should not be replicated for security or operational reasons. In these cases, a single server hosts the application resource, and could therefore become a single point of failure. to minimize the risks of failure for such critical application resources, data center designers can deploy the applications on very high-end servers that have built-in resiliency and introduce high availability active/standby configurations for disaster recovery. Connectivity concerns can be addressed by multihomed network links between the server and the device, and multiple network paths for users and other resources. this type of design requires the engineering of redundant links and backup systems, and may require periodic state and data replication to keep the active and backup systems synchronized.
DC Network Users Critical Application Resource
Figure 13. Critical application resources Server Link Resiliency Design

the objective of server link resiliency is to maintain application and service availability if access to an individual server is interrupted. Server link resiliency can operate at several levels within the cloud data center, as shown in Figure 14. the figure shows server pools that provide compute services within the data center. this simplified figure shows a single tier topology; however, the same approach applies to all of the multitier access-core topologies. Resiliency in this context can operate at the link level, network device level, and virtual machine mobility level, as described in the following subsections.
19
EX Series/ QFX3500
Network device resiliency
Network device resiliency
Link resiliency
Standby link
Active-active LAG
Standby link
VM
VM Compute
VM VM Mobility
VM
VM Compute
VM
Figure 14. Server link resiliency overview

Server Link Resiliency
to avoid a single point of failure at the link level, servers can be deployed with dual homed, resilient connections to network devices. In this arrangement, multiple physical connections are grouped in a single logical connection. the logical connection can be an active/active laG in which multiple links are dual homed (combined into a single logical link to a core device) with load sharing, or an active/standby arrangement in which the standby link is used only if the active link fails.
Network Device Resiliency

In the server context, network device resiliency addresses the need to maintain connectivity if a network device fails. this can be achieved through dual homing to multiple network devices, which provides both link and network device resiliency. the principle is the same as for server link resiliency, except that the redundant active/active or active/ standby links involve multiple network devices.
Virtual Machine Mobility

Virtual machine mobility addresses the failures within the server itself. In this arrangement, a virtual machine on a server in one pod is paired with a virtual machine on a server in another pod, and the virtual machine on the second server takes over if the first server fails for any reason. applications that are deployed on the virtual machine in the first pod are replicated on the virtual machine in the second pod. the second virtual machine can be deployed on any server within the data center or in another data center that has layer 2 connectivity to the first server.
Network Device Resiliency

the following elements can be used to minimize the effects of a single point of failure of a network device or critical component of a device: Hot-swappable interfaces Unified in-service software upgrade (unified ISSU) Redundant switching fabric Redundant routing engine
20
Minimizing single points of failure is of major importance in the data center because modern service-level guarantees and five nines uptime requirements preclude the traditional practice of scheduled downtime for maintenance. In some cases, maintenance window provisions in requests for proposal (RFPs) have disappeared altogether. Globalization is also a significant factorwith multiple customers and teams working around the clock, there are no off peak traffic periods for the always-on network. the bottom line is this. Modern network operating systems must enable in-service router changes and upgrades.
Hot-Swappable Interfaces
the routing community took the first step towards facilitating unified ISSU when vendors introduced hot-swappable interfaces for network devices. Many routers, including all of those in Junipers product line, no longer need to be reset to insert or remove an interface card. Instead, the box dynamically recognizes the new interface and begins to communicate with it immediately. New components can thus be inserted and removed from the router without taking the system down.
unified In-Service Software upgrades

the ability to provide unified ISSUreplacement of an entire operating system without a planned outageis unique to devices running Junos oS. Juniper Networks customers can upgrade a complete operating system, not just individual subsystems, without control plane disruption and with minimal disruption of traffic. Upgrading is a complex operation that requires extensive software changes, from the control plane code to microcode running on the forwarding cards, and Junos oS is unique in its ability to support these changes without bringing the network down or causing major traffic disruption. an upgrade of this kind is impossible for users of other systems who are forced to juggle multiple release trains and software versions when planning each upgrade. Careful planning and testing are required to choose the right release one that includes the new functions but does not forego any existing features or hardware support. also, only Junos oS provides an automatic configuration check before the upgrade. with other solutions, users are usually notified of an inconsistent, unsupported configuration after the upgrade, when it is too late to abort. Due to these risks, many It groups avoid upgrading their software unless it is absolutely necessary, and continue to run old versions of code. this can severely limit options when the network must support new requirements and when users are impatient for new features. the associated uncertainty can create havoc with quarterly budgets and project resources.
unified ISSu Methodology

Considering the immense variation among todays IP network topologies, equipment, and services, it is not surprising that various router vendors have taken different approaches to unified ISSU. the right approach is one that addresses the practical problems in todays networks and has the flexibility to meet the needs of future networks. although the unified ISSU process is complex and approaches to the problem vary, there are two major goals: Maintain protocol adjacenciesa broken adjacency makes it necessary to recalculate routing paths. If this occurs, tens of thousands of protocol adjacencies must be reset and reinitiated, and as many as a million routes removed, reinstalled, and processed to reestablish network-wide forwarding paths. Meet Sla requirementsan upgrade mechanism should not affect network topology or interrupt network services. Noticeable packet loss, delay, and jitter can be extraordinarily expensive in terms of Sla penalties and damaged customer confidence. Unified ISSU accomplishes these goals by leveraging nonstop active routing (NSR), which eliminates routing disruptions so that l2/l3 adjacencies can stay alive, and by minimizing packet loss to meet Sla requirements.
21
Redundant Switching Fabric

In a redundant configuration, a switch fabric module (SFM) is used with two switch fabrics and Routing Engines (REs) to achieve full bandwidth along with RE and switch control redundancy and switch fabric redundancy. the main function of the SFM is to provide a redundant switching plane for the device. For example, the SFM circuitry in a Juniper Networks EX8208 Ethernet Switch is distributed across three modulestwo RE modules and one SFM module. any two of these three modules must be installed and functional to provide a working switch fabric with no redundancy. the third module, when present, provides partial redundancy (2+1) for the switching functionality, such that if any one of the two functional modules becomes nonoperational, the third module takes over. working together, the RE and SFMs deliver the necessary switching capacity for the EX8208 switch. when the second RE module is present, the additional switch fabric serves in hot-standby mode, providing full 2+1 switch fabric redundancy. the SFMs are hot-swappable and field-replaceable, enabling failed units to be easily replaced without service interruption. the two active, load-sharing switch fabrics on the RE and SFMs collectively deliver up to 320 Gbps (full-duplex) of packet data bandwidth per line-card slot, providing sufficient capacity to support future 100GbE deployments without requiring any forklift upgrades or changes to the network infrastructure. the EX8208 switch backplane is designed to support a maximum fabric bandwidth of 6.2 tbps.
Redundant Routing Engine

two to ten EX4200 switches can be interconnected to create a Virtual Chassis configuration that operates as a single network entity. Every Virtual Chassis configuration has a master and a backup. the master acts as the master RE and the backup acts as the backup RE. the Routing Engine provides the following functionality: Runs various routing protocols Provides the forwarding table to the Packet Forwarding Engines (PFEs) in all member switches of the Virtual Chassis configuration Runs other management and control processes for the entire Virtual Chassis configuration the master RE, which is in the master of the Virtual Chassis configuration, runs Junos oS software in the master role. It receives and transmits routing information, builds and maintains routing tables, communicates with interfaces and PFE components of the member switches, and has full control over the Virtual Chassis configuration. the backup RE, which is in the backup of the Virtual Chassis configuration, runs Junos oS software in the backup role. It stays in sync with the master RE in terms of protocol states and forwarding tables. If the master becomes unavailable, the backup RE takes over the functions that the master RE performs.
Network OS Resiliency and Reliability Features

Data center designers can significantly enhance resiliency in the data center simply by deploying Juniper devices. the Junos operating system, which runs on all Juniper devices, incorporates the following important resiliency features to protect against the effects of internal failures: Separation of the routers control plane and forwarding planes operating system modularity Single code base Graceful Routing Engine switchover (GRES) and restart Nonstop active routing (NSR) Juniper Networks introduced both of these features to the routing community, and both are in widespread use today. the need to forward packets and process routes simultaneously presents a major challenge for network routers. If the traffic load through a router becomes very heavy, most resources might be used for packet forwarding, causing delays in route processing and slow reactions to changes in the network topology. on the other hand, a significant change in the network topology might cause a flood of new information to the router, causing most resources to be used in performing route processing and slowing the routers packet forwarding performance.
22
Routing and Forwarding on Separate Planes

Figure 15 depicts the pioneering Juniper architecture that enables continuous systems by clearly separating the control plane from the forwarding plane. the key to the architecture lies in internal resource allocation. If most resources are consumed by one of the two basic functions (control or forwarding), the other function suffers and the router is destabilized. the solution is to perform these functions in separate physical entities, each with its own resources.
Control Plane
Route Processes RIB Management Processes Kennel FIB Security
Forwarding Plane
FIB Layer 2 Processing Interfacces
Figure 15. Separate control and forwarding planes

the control plane is also known as the routing plane, and its primary component is the Routing Engine, which is redundant in many Juniper platforms. on all platforms, the Junos oS control plane is based on a BSD kernel. the forwarding plane is also known as the data plane, and its primary component is the PFE. the control plane maintains peer relationships, runs routing protocols, builds the routing table, maps destination IP addresses with physical router interfaces, and builds the FIB. the FIB is exported to the forwarding plane, which uses it to send packets out of the correct interface and on to the next-hop router. Having a copy of the forwarding table in the forwarding plane makes it possible for the router to continue forwarding packets even if a software bug or routing issue causes problems in the control plane.
Modular Software
the division of labor between control and forwarding planes has its parallel in the next essential architectural characteristic of Junos oSits fundamental modularity. a key advantage of modularity is the inherent fault tolerance that it brings to the software. Each module of Junos oS runs in its own protected memory space and can restart independently, so one module cannot disrupt another by scribbling on its memory. If there is a software problem with Junos oS production code, the problem can be quickly identified, isolated, and fixed without an interruption in service. Junos oS automatically restarts failed modules without having to reboot the entire device. the modular Junos oS design is in stark contrast to monolithic architectures in which the operating system consists of a large single code set. In a monolithic architecture without isolation between processes, a malfunction may cause a full system crash, as one failure creates memory leaks and other problems that affect many other processes. the device must restart to correct the problem, putting the platform out of service for the restart period.
Single Code Base

Unlike other network operating systems that splinter into many different programs and images while remaining under a common name, Junos oS has remained a single, cohesive system throughout its life cycle. Juniper Networks engineers develop each Junos oS feature only once, and then apply it to all devices and security platforms where it is needed without requiring a complete overhaul of the code. as a result, each new version of Junos oS is a superset of the previous version. Customers do not need to add separate packages when a feature is desired, but only need to enable it.
23
Juniper Networks methodologically enhances the single Junos oS source base through a highly disciplined development process that follows a single release train. Developers ensure a single consistent code set for each feature, and the result is well understood, extensively tested code. the Junos oS testing process includes repeated testing with automated regression scripts. Developed over many years, these test scripts are key pieces of Juniper Networks intellectual property. through the extensive testing of each Junos oS release, bugs and other problems are likely to be found and corrected by Juniper engineers before customers ever see the new version. Because the same code runs across all Juniper Networks routers, each feature provides a common user experience on all devices. a BGP or oSPF configuration works the same way on a branch router as it does in the core of a service provider network, and also uses the same diagnostic and configuration tools. when a network rolls out on multiple Juniper platforms, a single operations team already has the knowledge required to configure and monitor all of the new devices. this kind of efficiency can significantly reduce a networks operating expense.
Graceful Routing Engine Switchover (GRES)

Most routers today make use of redundant control plane processors (REs in Juniper Networks terminology), so that if one processor fails, the other can take over router operations. one RE serves as the master and the other as backup. the two REs exchange frequent keepalive messages to detect whether the other is operational. If the backup RE stops receiving keepalives after a specified interval, it takes over route processing for the master. the limiting factor in this scenario lies in the fact that the data planes PFE is reinitialized during the switchover from master to backup Routing Engine. all data plane kernel and forwarding processes are restarted, and traffic is interrupted. to prevent such a disruption, control plane state information must be synchronized between the master and backup RE. this is where GRES comes in. GRES provides stateful replication between the master and backup Routing Engines. Both REs maintain a copy of all important entities in the Junos oS kernel, such as interfaces, routes, and next hops. In this way, the backup does not need to learn any new information before taking over from the master. the routers forwarding plane breaks its connection with the routing tables on the old master and connects to the new master. From the point of view of packet forwarding, switching the PFE connection from one RE to the other happens immediately, so no packet loss occurs. Under GRES, the control plane routing protocol process restarts. Neighboring routers detect the restart and react to the event according to the specifications of each protocol. If there is an RE switchover in router X, for example, any neighboring router that has a peering session with router X sees the peering session fail. when router Xs backup RE becomes active, it reestablishes the adjacency, but in the meantime the neighbor has advertised to its own neighbors that router X is no longer a valid next hop to any destination beyond it, and the neighbor routers start to look for an alternate path. when the backup RE comes online and reestablishes adjacencies, its neighbors advertise the information that router X is again available as a next hop and devices should again recalculate best paths. these events, called routing flaps, consume resources on the control planes of all affected routers and can be highly disruptive to network routing. to preserve routing during an RE failover, GRES must be combined either with graceful restart protocol extensions or NSR (Junipers recommended solution). Graceful restart protocol extensions provide a solution to the flapping problem, but not necessarily the best solution. Graceful restart is defined by the Internet Engineering task Force (IEtF) in a series of Requests for Comment (RFCs), each specific to a particular routing protocol. Graceful restart specifies that if router Xs control plane goes down, its neighbors do not immediately report to their own neighbors that router X is no longer available. Instead, they wait a certain amount of time (or grace period). If router Xs control plane comes back up and reestablishes its peering sessions before the grace period expires, as would be the case during an instantaneous RE switchover, the temporarily broken peering sessions are not visible to the network beyond the neighbors. with graceful restart, an RE failover is transparent to all nodes in the network, with the exception of router Xs peers. this is a great advantage, as there is no disruption to forwarding on router X, its peers, or any other routers across the network. there is no change in traffic patterns, and no impact on latency, packet ordering, or optimal route selection.
24
During the grace period, it is assumed that the node that is not routing is forwarding traffic and preserved stateoften called nonstop forwarding. the graceful wait interval is configurable by the user and negotiated between the nodes. It is usually several seconds long. During the graceful wait interval, the traffic is not supported by active routing, so a restarting nonstop forwarding node could potentially send traffic to a destination that is no longer validoften called blackholing traffic. other considerations associated with graceful restart include the following: Each neighbor is required to support the graceful restart protocol extensions. Graceful restart must stop if the network topology changes during the grace period. In some cases, it is not possible to distinguish between link failure and control plane failure. Routing reconvergence could exceed the grace period, for example if router X has hundreds of BGP peers or protocol interdependencies that complicate the reconvergence process. there is no widespread industry acceptance of graceful restart. the requirement for all nodes to be running the graceful restart extensions is particularly bothersome in a multivendor, multichassis environment, and even more difficult when a different organization controls each peering router. In addition, during the graceful restart period, router X is not removed from the network topology, and the topology is therefore frozen. this means that graceful restart should only be used when the network is stablea circumstance that is difficult to guarantee. a better solution is required, one that is transparent to network peers, does not require peer participation or allow adjacencies or sessions to drop, and has a minimal impact on convergence. the RE switchover should also be allowable at any point, no matter how much routing is in flux.
Nonstop Active Routing

Junipers solution to the problems presented by graceful restart is known as nonstop active routing (NSR). this term may be familiar to users, as Juniper is not the only router vendor that has caught the nonstop routing bug. However, Juniper has implemented an approach that is radically new and innovative. Juniper engineers define nonstop as the integrity of control and forwarding planes in the event of failovers or system upgrades, including minor and major release changes. Routers running Junos oS do not miss or delay any routing updates when network problems occur. the goal of a nonstop operation is ambitious, and reflects Junipers innovation and expertise as it introduces these new concepts and this new vision for the industry. with NSR, the responsibility for repairing a failed Routing Engine is placed entirely on the router itself. there is no need to modify or extend existing routing protocols or place any demands on peers. NSR uses the same infrastructure as GRES to preserve interface and kernel information. However, NSR also preserves routing information and protocol sessions by running the routing protocol process on both REs. In addition, nonstop active routing preserves tCP connections maintained in the kernel. From a system architecture point of view, the principle difference between NSR and the graceful restart protocol extensions is that both REs are fully active in processing protocol sessions. Both REs are running the routing processes and receiving routing messages from network neighbors. Selection of the master is now a matter of selecting one of two running REs and connecting its outbound message queue to the network to communicate with neighbors. NSR is self-contained and does not rely on helper routers (as in graceful restart) to assist the routing platform in restoring routing protocol information.
Nonstop Bridging/Nonstop Routing

Nonstop bridging and routing mechanisms enhance the resiliency characteristics of network protocols by preventing service interruptions during the brief period when the backup RE takes over for a failed RE. left to their own devices, the absence of the master RE would cause routing and switching protocols to begin the process of reconverging network paths to route around what they believe to be a failed device. the Juniper nonstop routing and nonstop bridging protocols prevent such a reconvergence from occurring, thus maintaining service continuity.
25
NSR allows a routing platform with redundant Routing Engines to switch over from a primary RE to a backup RE without alerting peer nodes that a change has occurred. Nonstop bridging extends these benefits to the l2 protocols implemented in Ethernet switching. together, these features enable RE switchover that is transparent to neighbors, maintaining l2 and l3 stability for supported platforms and protocols. Because NSR does not disrupt protocol adjacencies, the RE switchover is transparent to neighbors. Even if the routing topology changes during the switchover, routing remains stable.
Network Resiliency Designs and Protocols

we have discussed how to design resiliency into the data center by adopting designs that include redundant application resources, compute resources, network links, and network devices, in addition to component level redundancy in each device. Network resiliency designs and protocols tie these together, working on top of the other resiliency layers to ensure that the architected resiliency elements operate as desired. Network link resiliency and efficient traffic forwarding both require the use of link management protocols to facilitate traffic flow and eliminate logical loops. this section describes the following resiliency protocol requirements and options: access-core inverse U loop-free design Multichassis laGs Redundant trunk groups Virtual Chassis at the core l3 routing protocols at the access tier Multiple Spanning tree Protocol (MStP) MPlS at the core level for large deployments
Access-Core Inverse u Loop-Free Design

the access-core inverse U loop-free topology described previously provides link and device resiliency in the network without introducing forwarding path loops. In this approach, each server pool is connected to two different access devices, and each access device is connected to one of a pair of core devices. Because the access devices are not directly connected to each other, loops do not form at the access and core tiers and link management protocols are not required. Resiliency operates at multiple levels. Each server pool has connections to multiple access devices, so failure of an individual access device does not cause service interruption. Because the core devices are paired, failure of a core device also avoids service interruption. Resiliency can be further enhanced through application and server resiliency features that operate on top of the physical and network layers. Because the inverse U design is loop-free, VlaNs can be expanded broadly. 802.1q trunk links between access and core devices can carry traffic for VlaNs across access and core devices. Broadcast domains can be as large as needed. agility is enhanced because devices requiring l2 agility can be placed anywhere throughout the broadcast domain. Further, the core devices can act as default gateways and run Virtual Router Redundancy Protocol (VRRP) to provide routing resiliency.
26
802.3ad LAG 802.1q Trunking
Core Tier EX Series/ MX Series

802.3ad LAG 802.1q Trunking 802.3ad LAG 802.1q Trunking 802.3ad LAG 802.1q Trunking 802.3ad LAG 802.1q Trunking
Access Tier
QFX3500
pNIC1 pNIC1 pNIC2 pNIC2
Figure 16. Resiliency with access-core inverse u

Multichassis Link Aggregation Groups
IEEE 802.3ad link aggregation enables multiple Ethernet interfaces to be grouped together and form a single link layer interface, also known as a link aggregation group (laG) or bundle. a typical deployment for laG aggregates trunk links between an access device and core device for point-to-point connection. on Juniper Networks MX Series 3D Universal Edge Routers, multichassis laG enables a device to form a logical laG interface with two or more other devices. Multichassis laG provides additional benefits over traditional laG in terms of node level redundancy, multihoming support, and loop-free l2 networking without StP. Multichassis laG can be configured for VPlS routing instance, circuit cross-connect (CCC) application, and l2 circuit encapsulation types. the multichassis laG devices use link aggregation Control Protocol (laCP) to exchange control information between two multichassis laG network devices. Multichassis laG provides load balancing and loop management for access-core hub and spoke designs. Multichassis laG involves multiple l3 devices at the core tier and combines multiple connections from each access device to the core tier into a single logical group to provide redundant connections. Because the multiple links operate as a single logical link, no logical l2 loop is created. If one of the links in the laG fails, traffic is automatically sent over other links in the group. a basic multichassis laG configuration involves two core devices and two access devices. Each access device connects to both core devices by way of single or multiple 10GbE links. MX Series devices at the core tier provide the multichassis laG configuration, supporting the links from the access devices. Multichassis laG is managed completely from the core device, and the access devices do not need to be aware of it. the link aggregation Control Protocol (laCP) manages the physical loops that are created by the access-to-core connections, preventing the creation of logical loops for traffic forwarding. Multichassis laG can be configured using active/active or active/standby links. the pair of core devices providing multichassis laG functionality must have l3 connectivity for state replication.
27
Figure 17 shows a laG configuration with MX Series devices at the core tier. the figure shows an 802.3ad laG link between the core devices. the connection can also be an l3 link.
Core Tier MX Series
Access Tier QFX3500

pNIC1 pNIC2 pNIC1 pNIC2
Figure 17. Multichassis LAG configuration

In this design, there are no physical l2 loops. VlaNs can be configured anywhere, and there are no limitations on the size of the broadcast domain. this approach provides increased agility, ease of integration, and scalability, along with a load-balancing mechanism to make effective use of the bandwidth in each laG. the MX Series devices at the core tier provide advanced routing, QoS, and virtualization capabilities, along with support for multichassis laG for links from the access devices.
Redundant Trunk Groups

Redundant trunk Groups (RtGs) provide resiliency with loop prevention for access-core hub and spoke designs. In an RtG configuration, each access device has a separate link to each of a pair of connected core tier devices. one of the links is active and one is passive. traffic passes through the active link but is blocked on the secondary link. If the active link goes down or is disabled administratively, the secondary link becomes active and begins forwarding traffic. loops are avoided because only one link is active at any given time. RtG is supported on Juniper Networks EX Series Ethernet Switches. the EX Series devices can be at the access tier, core tier, or both, and RtG can be configured at the access or core tier. Configuring RtG at the core tier allows you to support RtG on multiple access devices from a single core device. Because control can be concentrated in one place, only local protocol configuration is required. there are no l2 loops, and convergence following link failure is rapid. when RtG is configured on access devices, the links to both the left and right core devices are configured as RtG members with one link as active and the other as standby. when the active link fails, the traffic converges to the standby link. Effective load balancing can be achieved by designating the connection to the core left device as the primary link for some access devices, and the connection to the core right device as the primary link for other access devices. a similar load balancing arrangement can be adopted when RtG is configured on the core devices.
28
Figure 18 shows RtG configured on access devices.

Core Tier EX8200 EX4500
802.1q Trunking
802.1q Trunking
Access Tier
pNIC1
pNIC2
pNIC1
pNIC2
Figure 18. RTG configuration

Virtual Chassis at the Core
you can deploy a Virtual Chassis as a logical group of multiple core devices. the EX8200 supports virtual chassis configure with 2 devices and EX4200 supports virtual chassis configuration of 10 devices. In the arrangement shown in Figure 19, two core devices are combined into a Virtual Chassis using a 10GbE fiber connection running the Virtual Chassis protocol. the Virtual Chassis core serves four access devices. Downstream, each access device supports one or more server/storage pools. Upstream, each access device connects to the Virtual Chassis using multiple 1GbE/10GbE uplinks in an active/active laG. the Virtual Chassis approach offers the advantages of other access-core layouts without requiring the use of link management protocols. Because the connections are to the full chassis, not to the individual devices in the chassis, links can be configured as laGs with no network loops. all links are effectively utilized through standard load-balancing mechanisms. with this design, the network agility is increased and security services integration and waN integration are simplified. all link management is handled by the Virtual Chassis protocol as the core device within the single core control plane, and all device level resiliency capabilities operate at the core.
Core Tier EX8200 Virtual Chassis
LAG
LAG
LAG
LAG
Access Tier EX Series/ QFX3500
Figure 19. Virtual Chassis core configuration
29
Layer 3 Routing Protocols

l3 routing protocols can be used for link management in the context of any of the previously discussed multitier designs (access-core hub and spoke, access-core inverse U, and access-core mesh). link management protocols can run at l2 or l3. at the l2 level, link management protocols can block certain links to avoid loops, thus converting to loop-free operation. avoidance of loops is critical at the layer 2 level, because l2 forwarding relies on broadcasting. By contrast, layer 3 protocols can use multiple paths with active load balancing and provide network resiliency through fast convergence. Juniper access and core devices support l3 at the access and core tiers and use l3 for loop reduction and for resiliency. In these designs, the l2/l3 boundary is at the access tier, and the access tier serves as the default gateway for the servers, which are typically dual homed. For layer 3 routing at the access tier, at least two logical devices sharing a VlaN are required to provide default gateway resiliency. VRRP manages router redundancy for the default gateway, preventing single points of failure. the advantage of using VRRP is that you gain a higher availability for the default path without requiring configuration of dynamic routing or router discovery protocols on every end host. VRRP routers viewed as a redundancy group share the responsibility for forwarding packets as if they owned the IP address corresponding to the default gateway configured on the hosts. at any time, one of the VRRP routers acts as the master, and other VRRP routers act as backups. If the master router fails, a backup router becomes the new master. In this way, router redundancy is always provided, allowing traffic on the laN to be routed without relying on a single router. there is always a master for the shared IP address. If the master goes down, the remaining VRRP routers elect a new master VRRP router. the new master forwards packets on behalf of the owner by taking over the virtual media access control (MaC) address used by the owner. when implemented in the network, VRRP assumes that if any link to a subnet is active, the router has access to the entire subnet. VRRP leverages the broadcast capabilities of Ethernet. If one of the routers in a VRRP configuration is running, address Resolution Protocol (aRP) requests for IP addresses assigned to the default gateway always receive replies. additionally, end hosts can send packets outside their subnet without interruption.
Edge
MX Series
Core
EX Series MX Series
SRX5800
ECMP
EX8200 EX4200 L2/L3 Boundary QFX3500
Access
Figure 20. Layer 3 configuration
30
the access and core devices can run standard routing protocols, including oSPF or RIP, or use static routing. For dynamic routing, the protocols can take advantage of Bidirectional Forwarding Detection (BFD). the BFD protocol, which was developed at Juniper Networks, is a simple, high-speed hello protocol that verifies connectivity between pairs of systems. BFD neighbor systems negotiate a peer relationship, and each neighbor specifies how quickly it can receive BFD packets. BFD rates can be specified in sub-millisecond increments. BFD is extraordinarily useful because it places a minimal load on network devices, markedly improves failure detection times, and reduces latency within the router and between the router and its neighbors. with nonstop active routing enabled, BFD session state is saved on both the master and backup Routing Engine. when an RE switchover occurs, BFD session state does not need to be restarted, and peer routers continue to interact with the routing platform as if no change had occurred. the layer 3 approach provides the highest link utilization, with optimized network paths determined by the routing protocols. Because operation is at l3, loops are not a concern. the broadcast domain is limited to the pair of access devices; however with Virtual Chassis deployment, a VlaN restricted within an access device does not limit the VlaN to a single rack or row. Up to 10 EX4200 switches can be flexibly deployed as part of the Virtual Chassis within the row, across the row, or across the data center location. with this agile deployment, Virtual Chassis extends the VlaN within the data center even when the VlaN is limited to a single pair of access devices.
Multiple Spanning Tree Protocol

traditionally, StP has been used to create a logical loop-free topology in l2 networks. StP calculates the best path through a switched network that contains redundant paths and uses bridge protocol data unit (BPDU) packets to exchange information between switches. Rapid Spanning tree Protocol (RStP) enhanced StP by decoupling the states and roles of ports, using fewer port states than StP, introducing additional ports, and providing faster convergence times. However, it still does not make good use of all available paths within a redundant l2 network. with RStP, all traffic from all VlaNs follows the same path as determined by the spanning tree; therefore, redundant paths are not utilized. Multiple Spanning tree Protocol (MStP) overcomes the RStP limitations and allows load sharing through the use of multiple spanning-tree instances (MStI). MStP supports the building of multiple spanning trees over trunks by grouping and associating a set of VlaNs to each spanning-tree instance. MStP also provides the capability to logically divide a layer 2 network into regions. Every region has a unique identifier and can contain multiple instances of spanning trees. all regions are bound together using a common and internal spanning tree (CISt), which is responsible for creating a loop-free topology across regions (while MStI controls topology inside regions). MStP uses RStP as a converging algorithm and is interoperable with earlier versions of StP. to be part of a common MStP region, a group of switches must share the same configuration attributes, which consist of configuration name, revision level, and VlaN mapping to MStI instance. If one of these attributes differs between two switches, they are considered part of different regions. In order for regions to communicate, a common spanning tree (CSt) instance runs across all regions. the CSt also forwards traffic for the VlaNs that are not covered by any MStI. Up to 64 MStIs are supported in each region on the MX Series and EX Series platforms.
31
Figure 21 shows an example MStP implementation. this design is typically used for legacy compatibility.
Edge
MX Series
Core
EX Series MX Series L2/L3 Boundary
SRX5800
MSTP
EX8200 EX4200 QFX3500
Access
Figure 21. MSTP configuration

MPLS at Core Level for Large Deployments
we have described the two-tier network topologies that can be deployed within a data center or a segment within a large data center. In todays environment, with an ever growing need for expansion and high capacity, the data center may grow beyond the size that can be supported efficiently with two-tier designs. In addition to capacity and expansion considerations, it is often necessary for data centers to be distributed within a physical location or across multiple physical locations for resiliency/compliance reasons and to accommodate consolidation efforts. MPlS provides an efficient and modular approach to scale the data center network, while incorporating agility, resiliency, service guarantees, and simplified operations. this subsection describes how MPlS-based designs can be used to interconnect data center network segments within a single data center location or across multiple locations. In addition to meeting bandwidth requirements, the network must support segmentation and service assurance for critical applications. the traditional approach is to build separate networks to meet different performance requirements for applications and services. However, this approach is not scalable and is cost prohibitive. MPlS facilitates sharing of expensive network interconnections by creating virtual network connections called labelswitched paths (lSPs), and by providing granular policy control to define service quality of traffic flows over the lSPs. Service providers have successfully used this open standard technology to support multi-tenancy of customer waN connections and meet scale requirements, while enhancing operational efficiency and reducing time to market. Data center designers should consider the MPlS open standard and large scale efficiency of network virtualization for interconnect networks between data center locations.
32
MPlS provides the most efficient method to extend a network segment across a data center location using l3 VPN technology, which supports dynamic discovery and minimizes operational configuration. as shown in the following figure, each data center network segment consists of a two-tier network architecture (access-core). Each segment connects to an MPlS-enabled network through a pair of edge tier devices. Juniper Network M Series Multiservice Edge Routers and MX Series 3D Universal Edge Routers support MPlS and can be dedicated for edge functions or consolidated for core/edge functions based on scale and performance requirements. the MPlS network consists of an MPlS core network and MPlS edge network. the core network transports MPlS labeled packets and the MPlS signaling protocol to dynamically distribute labeling and traffic forwarding information. It can also support traffic engineering capabilities to provide service guarantees for application traffic. the two major protocols used for signaling lSPs are lDP and RSVP. the IEtF does not specify particular signaling for dynamic lSPs. Each has a unique function and purpose. a simpler protocol with hop-by-hop signaling and interior gateway protocol (IGP) path dependency, lDP offers ease of configuration and troubleshooting. RSVP provides granular traffic engineering capabilities. It reserves whole end-to-end paths and can provide its own path selection. Implementing RSVP requires manual configuration for each lSP at the ingress mode. In contrast, by simply enabling lDP, lSP connectivity occurs among all routers. the following is a detailed explanation of this process. the MPlS edge provides the mapping of local network segment to MPlS network segment. the local segment can be identified by a VlaN or set of VlaNs. It also provides automatic discovery and signaling to extend the segments across the MPlS network to other data center network segments. For MPlS VPN signaling, MP-iBGP is used. Using MP-iBGP label edge router (lER) acquires reachability information about a given VPN network through the BGP community and VPN-IPv4 family. Data center designers can also take advantage of route reflector support of iBGP to support large scale MPlS deployments that support autodiscovery and signaling to simplify operations when new VPNs or sites are introduced. this capability significantly reduces the operational effort.
MPLS
MX Series
MX Series
EX8200
QFX3500
QFX3500 EX4200 EX8200
EX4200
EX8200
Figure 22. MLPS design

It is also possible to enable VPlS over MPlS to extend layer 2 connectivity across multiple locations. VPlS allows different sites to communicate as if they were connected to the same laN. the extension of l2 connectivity may be required to support virtual machine mobility across data center locations. Virtual machine mobility supports the agility required in todays cloud environment by leveraging VPlS and MPlS. Virtual machine mobility also requires guaranteed service quality, which can be supported using the traffic engineering capabilities of the MPlS network.
33
MPlS traffic engineering allows end-to-end QoS experience as well as fast network convergence (~50 ms) in case of network connection failures. this increases the transparency of network failures for applications and minimizes or eliminates disruption in application services. the MPlS backbone can also be extended to enterprise waN connections. this improves application delivery over the cloud network and meets performance and high availability requirements for application delivery and increased business productivity. the MPlS network provides the needed traffic separation and segmentation to permit creation of virtual network links to support on-demand resource allocation. Simple and transparent, MPlS automatically handles many of the details involved in segmenting and transmitting traffic. MPlS technologies are fast enough to respond to demand, faulttolerant, and able to provide fast convergence and recovery time. as an industry leader in the development and deployment of MPlS, Juniper Networks leads the way in making it possible for enterprises and service providers to implement network architectures and services based on MPlS. the MX Series provides a wide range of MPlS features and functionality powered by Junos oS.
Agility and Virtualization

as the pace of global economic activity continues to accelerate, organizations must be able to respond quickly to changes in demand and other market conditions. In the context of data center design, agility refers to the ability to rapidly and efficiently adapt to changes in demand for applications and services. as end users change usage patterns, demand for some applications will grow, thereby stressing processing capacity and network connectivity. at the same time, demand for other applications and services may decline, resulting in the underutilization of other resources. agile data center designs support changes in usage patterns by allowing data center managers to add or move application resources quickly and easily without disrupting operations. Several layers of planning are involved in designing an agile infrastructure. Because physical infrastructure changes have the longest lead time in the data center, the physical layout should be designed for capacity and growth with an extended time horizon, so that responding to changes in usage does not require reconfiguration of the physical infrastructure. Moving up from the physical layer, beginning with l2, agility is enhanced through the use of virtualization techniques. Virtualization is an essential element in the data centers agility strategy. Virtualization allows individual physical resources to function as multiple logical resources transparently to users. Designers can take advantage of a variety of virtualization elements, including VlaNs, virtual routers, Virtual Chassis, logical systems, logical routers, MPlS VPNs, and security zones.
Logical Systems
a logical system is a partition of a physical router and can contain multiple routing instances and routing tables. traditionally, a service provider network design required multiple layers of switches and routers to transport packet traffic between customers. as seen on the left side of Figure 23, access devices are connected to edge devices, which are in turn connected to core devices.
Network topology without logistical systems
(eight separate physical devices)
Network topology with logistical systems

(one physical router with eight logical devices)
Figure 23. Network with logical systems
34
this complexity can lead to challenges in maintenance, configuration, and operation. to reduce such complexity, Juniper supports logical systems. logical systems perform a subset of the actions of the main router and have their own unique routing tables, interfaces, policies, and routing instances. as shown on the right side of the figure, a set of logical systems within a single router can handle the functions previously performed by several small routers. with Junos oS, you can partition a single router into multiple logical devices that perform independent routing tasks. Because logical systems perform a subset of the tasks once handled by the main router, logical systems offer an effective way to maximize the use of a single routing or switching platform.
Virtual Routers
Virtual routers enable you to maintain resources for multiple devices within a single device. In the router case, it is most important to maintain separate routing tables for each virtual router so the virtual routers are isolated from each other. you can allocate which routing protocols and which interfaces on a device will participate as part of each virtual router.
Logical Routers
a logical router is a partition of a physical router and can contain multiple routing instances and routing tables. logical routers perform a subset of the actions of a physical router with their own unique routing tables, interfaces, policies, and routing instances. a set of logical routers within a single router can handle the functions previously performed by several small routers. a logical router can contain multiple virtual router routing instances. logical routers enhance agility in the data center by supporting expansion and reconfiguration through straightforward application of routing policies within the logical routers.
Virtual Chassis
a feature of the EX4200 devices, Virtual Chassis allows the interconnection and operation of switches as a unified, single, high bandwidth device. In a Virtual Chassis configuration, all member switches are managed and monitored as a single logical device. this approach simplifies network operations, allows the separation of placement and logical groupings of physical devices, and provides efficient use of resources. Management of the Virtual Chassis configuration is performed through the master switch. a Virtual Management Ethernet (VME) interface allows remote management by connecting to the out-of-band management port of any member switch through a single IP address. Data center staff can easily add devices to a Virtual Chassis without disrupting operations or modifying the underlying infrastructure.
VLANs
By logically segmenting traffic according to organization, function, or resource usage, data center operators can segment traffic for efficient traffic flow without changing the physical equipment configuration. Security policies and broadcast activity can be set up on a per VlaN basis, and changes in usage patterns can be readily accommodated by changes in VlaN assignment. VlaNs are highly scalable within and across data centers
Security Zones
Security zones are logical entities to which one or more interfaces are bound. with many types of Juniper Networks devices, you can define multiple security zones to meet specific network needs. on a single device, multiple security zones divide the network into segments to which you can apply various security policies to satisfy the needs of each segment. Security zones are the building blocks for policies. Security zones provide a means of distinguishing groups of hosts (user systems and other hosts such as servers) and their resources from one another in order to apply different security measures to them. Security zones can also be defined to perform specific functions, such as management zones for host management interfaces. Security zones enhance agility by allowing data center operators to apply fine granularity to network security design without deploying multiple security appliances.
35
MPLS VPNs
By adopting MPlS open standards and large-scale efficiency to support network virtualization for interconnected networks between data center locations, data center designers can extend network segments across data center locations using l3 VPN technology, which supports dynamic discovery and minimizes operational configuration. Figure 24 shows how VPlS supports switching across data centers. In this example, each of two data centers has several server and storage pools that are connected to EX Series access devices in a Virtual Chassis arrangement and in turn to MX Series core devices. Upstream, the core devices also provide MPlS waN connectivity to the outside world, including a second data center. VRRP is included for redundancy at the core level.
Core
MX Series
MX Series
EX Series
EX Series
Virtual Chassis
Virtual Chassis
Mirroring VLAN 1
Mirroring VLAN 2
Mirroring VLAN 1
Mirroring VLAN 2
Data Center 1
Data Center 2
Figure 24. VPLS switching across data centers

Figure 25 shows how VlaNs, security zones, and MPlS VPNs combine to enhance agility in the data center. First, VlaNs are added to the basic network. Security zones are then formed to apply policies to groups of VlaNs. Finally, MPlS VPNs are added to create virtual network connections within the data center and between data centers.
36
Zone 1
Zone 2
VLANs
MPLS-VPN
VPLS-VPN
Zone 4 Data Center

MPLS-VPNs
Zone 3
Data Center
Security zones
Figure 25. Agility designs for the data centers

For example, an enterprise supporting a large Internet portal with web 2.0 applications may have hundreds of multitiered (n-tiered) applications with complex interconnections between clients, database servers, firewalls, storage, and other devices. as the traffic grows over time, interconnections based on a traditional physically layered architecture become increasingly complex and create scaling challenges, as shown on the left side of Figure 26.
DMZ Architecture simplifcation: Consolidated Web Firewalls (SRX5800) Consolidated Scalable, High-Performance Routers App (MX960)
SRX5800
EX Series MX Series
SRX5800
EX Series/ QFX3500
Network Virtualization Layer

DMZ DB Exnet Web Apps AAA NOC DB NAS
Traditional Data Center Architectures and Secure Layering
Next-Generation Virtual Data Center Architecture
Figure 26. Traditional versus virtual appliance architecture
37
Challenges include the following: Users require rapid secure access to large volumes of distributed data for multitiered applications. Numerous interconnections in the traditional, layered physical data center architecture lead to network utilization inefficiencies. the data center designer can address these challenges in the following ways: Improve secure access to large volumes of distributed data by moving from a traditional, layered physical architecture to a virtual architecture Improve network utilization with a collapsed two-tier network architecture the simplified virtual architecture shown on the right of Figure 26 shows a decoupling of the network architecture from the application deployment architecture. any-to-any connectivity is provided between the end users and application services. this is achieved with the introduction of a virtualization layer that essentially decouples the network resource and the application services. Decoupling allows applications to be transparent to the underlying network resources. after decoupling, network service virtualization can be mapped into virtual security zones or trust zones in the Juniper Networks SRX Series Services Gateways, providing the same or higher levels of security than the traditional architecture. Figure 27 shows a simplified data center in which the network resources and applications are decoupled. Core MX Series routers connect to SRX Series platforms that have virtual security services configured in independent security zones. the top of rack deployment is configured to use Virtual Chassis. the waN edge connects the data center to the outside world.
WAN Edge
M Series
M Series
Consolidated Core Layer
Mapping of VLANs to Security Zones Map VRFs on core to routing instances on SRX Series Establish adjacency between VRFs on core Trac between networks runs through SRX Series by default or ltered on MX Series IP VPN
VRF #1 VRF #2
EX Series MX Series
VRF #1 VRF #2
Mapping VRF to Security Zones
SRX5800
IPS #1 Firewall #1 NAT #1
Firewall #2 Mapping VRF to Security Zones Firewall #3
Security Zones Firewall IPS NAT
NAT #1
VLANs
Access Layer
EX4200 Series
HR
Finance
Guest
Departments
Trunk
VPN
Server VLAN
Figure 27. Simplified data center
38
Data for different departments (for example, human resources, finance, or guest) is hosted in different data center servers. the traffic to and from the departments is separated by different VPNs. a virtual routing and forwarding table (VRF) can be configured to send specific VPN traffic to virtual security zones that contain intrusion prevention system (IPS), Network address translation (Nat), and firewalls. other VPN traffic can be directed to destinations without further processing. Multiple security zones can apply specific policies for the VPN traffic, which may traverse multiple security zones inside the SRX Series before being sent to its destination VPN.
Capacity Planning, Performance, and Scalability

In existing compute and network environments, planning for growth and change is a costly and time-consuming effort. a successful organization must be able to readily and cost-effectively scale. to data center users, applications and services are the means to an end, i.e., the ability to obtain information, complete transactions, or perform a job. High performance is essential to employee productivity, customer satisfaction, and the enterprises bottom line. a successful data center design must provide sufficient initial capacity and support fast and cost-effective scaling, even when capacity limits are reached within existing data centers. It is important for data center designers to consider several metrics when designing for capacity, performance, and scalability.
Throughput
Each network device has a maximum capacity, or throughput, which is the maximum rate at which it can pass traffic across the backplane and to interface cards. total throughput depends on the interface capacity (10GbE or 40GbE/100GbE in the near future). the back plane capacity determines the forwarding capacity of the device and typically scales to a few terabytes. Data center designers should consider the non-blocking performance of the device to calculate the maximum achievable network throughput.
Oversubscription
oversubscription refers to the relationship between available capacity and the potential demands for that capacity. For example, if an application serves 100 users and the available throughput supports 50 simultaneous users, then access to the application is oversubscribed at a ratio of 2:1. traditionally, oversubscription was acceptable because links were typically underutilized and the application server itself was the bottleneck. with virtualization, link utilization is much higher, and it is more likely that link capacity will max out with oversubscription. whether oversubscription becomes a significant problem depends on the nature of the application, the expected number of simultaneous access sessions, application performance requirements, and predictions on how these metrics will change over time. the data center designer should take all of these factors into account in capacity planning. Specifically, to reduce oversubscription, the core tier should be non-blocking and provide line-rate (wire-rate) forwarding.
Latency
latency refers to the delays that occur in the network due to a variety of factors, including number of links, processing, and buffering. Even the smallest individual delays can contribute to poor user perception of performance and loss of productivity. we have previously discussed the advantages of multitier designs, which support scalability and resiliency. an unintended consequence of these designs can be increased latency, because traffic from one server to another must pass through the access layer tier to the core tier and then back down to the access tier to reach the destination server. Data center designers can reduce latency and increase performance in multitier designs by deploying Junipers Virtual Chassis at the access tier. Server-to-server traffic within the Virtual Chassis domain does not need to travel up the core tier. as shown in Figure 28, traffic can flow between the servers in Rack 1 and Rack 2 through the Virtual Chassis without traveling up to the core.
39
Modular Scalability
we previously discussed virtualization techniques and how they allow data center operators to add and move resources without adding or reconfiguring physical devices. Even with maximal use of virtualization techniques, it will be necessary to add physical capacity to support growth. Juniper products are based on modular designs and support easy addition of physical capacity on demand, without service disruption. Designers should specify chassis capacity to support long-term growth, allowing data center operators to add additional cards to the chassis as needed.
Port Capacity
Port capacity, combined with throughput, determines the volume of traffic that a network device can support. Juniper products are highly scalabledesigned with multiple 1GbE and 10GbE ports for highly scalable port capacity. Designers can add or reconfigure ports to be consistent with objectives for throughput, latency, and performance.
Software Configuration
In addition to the equipment considerations discussed in this section, software sizing is an important factor in planning for capacity, performance, and scaling. the MaC table, routing table, aRP table, number of multicast routers, and number of access control lists (aCls) all control the amount of traffic that can pass through a device. other factors include the number of supported VlaNs, virtual routers, logical routers, and logical systems. Designers should plan for sufficient capacity for all of these elements.
Solution ImplementationSample Design Scenarios

this design guide has presented a comprehensive set of guidelines for data center design. we now consider how the guidelines can be applied in practice to two different data center typesan enterprise data center and a transactional data center.
Scenario 1: Enterprise Data Center

the company is a leading supplier in the It infrastructure industry. with the emergence of server virtualization, and with more servers being consolidated into fewer data centers, the companys data centers are playing a more significant role than ever before. the company is attempting to automate as many organizational processes as possible while dealing with very tight budgets and many new applications. Current and legacy platform technologies are also presenting major challenges to the networking teams. these challenges have imposed a new set of requirements on the companys networking infrastructure. workloads in the enterprise It environment are expected to migrate across data center locations, making it necessary to adjust allocation of resources to specific workloads. the network must support the process of growing and changing workload distribution and deliver the new workloads to applications immediately and reliably. Security functionality must be able to work per application and per business, and must therefore incorporate advanced segmentation and policy capabilities. the company must also support applications running across multiple sites, retaining the same security settings across sites, and maintaining layer 2 connectivity between nodes across sites. Complexity is a major concern as the company works to support the latest enterprise application and platform technologies. at scale, the complexity of application and platform support with multiple technologies becomes daunting. Supporting server virtualization, mashup applications, and the integration of public cloud services into a hybrid cloud model involves a variety of technical challenges that are difficult to meet at the network level.
40
to thrive in todays dynamic and complex world, the company needs a simple, secure, easy to operate, and no compromise data center networking solution. Requirements operational simplicity through simplified design Simplified services integration High-performance network Reduced total cost of operation. Now lets look at the details of these requirements. the data center network must deliver high performance to meet burgeoning bandwidth requirements for high-performance compute clusters, reduce application response time, and increase network efficiency. Network performance to support applications must be seamless across multiple data centers and incorporate high availability features for resiliency. the It organization must continually work to improve productivity while remaining under tight budgetary constraints. to meet these challenges, the organization should adopt a data center design that incorporates operational simplicity with a single network operating system and a standard management interface. a simplified design that eliminates spanning tree complexity will improve network performance, reduce latency, and increase operational efficiency. with the simplified data center design, security must not be compromised. Simple and effective security services integration is needed to meet intra-data center security and compliance requirements. the It organization is squeezed between growing demands for data center services and ever tightening budgets. It is necessary to adopt designs and products that meet technical requirements while also reducing total cost of operation. the It organization can also address cost and environmental issues by adopting strategies that reduce power consumption and waste. Design Physical device placement. the data center has some applications that run on virtual servers with a 1GbE connection and is also transitioning to a 10GbE compute environment with highly virtualized servers. we choose the top of rack deployment model. the top of rack deployment is recommended because it reduces cabling complexity. this deployment also provides an economical solution for 10GbE server connectivity. Physical network topology. the enterprise depends on business applications that must be highly available to application users. the resiliency of It infrastructure is essential and critical. the access-core mesh described in access-Core Mesh Design earlier in this guide provides dual homed connectivity from each access device to two core devices. this topology is recommended because it provides the needed resiliency while also supporting legacy single homed server deployments. logical network topology. as we have chosen access-core full mesh deployment, it is beneficial to provide load balancing across both uplinks from access. this design reduces oversubscription while making efficient use of network bandwidth. this is achieved by configuring layer 3 routing between the access and core tiers. the EX Series switches deployed at the access tier support full l3 functionality, which is leveraged in this design to provide high performance with a very low oversubscription ratio and highly resilient network design. a pair of access devices is interconnected, and VRRP is configured to provide default gateway redundancy. EX4200 Virtual Chassis. the data center runs high-performance compute cluster-based applications with computing resources that span multiple racks. Virtual Chassis is recommended because it provides efficient communication, supporting multiple racks with high-speed backplane connectivity. Because the high-speed backplane connections are used for intra-cluster communication, uplinks can be used primarily for user-to-server communication, thereby reducing oversubscription. QFX3500. the design uses QFX3500 switches to support the 10GbE server environment with non-blocking, low latency network connections.
41
EX8200 line. the design uses EX8200 line of high-performance switches as non-blocking core platforms to avoid network congestion, reduce oversubscription for access tier-to-core tier network connections, and reduce application response times. SRX3600. a cluster of Juniper Networks SRX3600 Services Gateway high-performance security platforms is integrated at the core tier to provide security services for intra-data center communication and meet compliance and security requirements. Figure 28 shows the enterprise data center design.
Edge
MX Series
Core
SRX3600 Cluster
EX8200 MX960 ECMP
EX4200
L2/L3 Boundary
QFX3500
Access
Figure 28. use Case: Enterprise data center Scenario 2: Transactional Data Center
this company is a major player in the financial services industry. with automated computation and trading essential to financial services profitability, the level of dependence on high-speed computation and market data feeds from a variety of sources is increasing. at high scale, every microsecond and nanosecond in a transaction count for profit or loss, and there is no tolerance for downtime. the company must be able to deliver line-rate data with the highest levels of security and assurance. Supporting transactions, market data, and algorithmic trading applications at the network level involves a challenging set of technical requirements. the compute infrastructure involves applications through platform drivers to the physical networking devices that are deployed across multiple data center locations. For the compute infrastructure to operate efficiently, performance at all levels must be highly predictable and reliable. In the transactional compute space, the speed at which market data is delivered from the source to the trader or system is critical for timely decisions, and the speed at which decisions arrive at the target exchange affects the relative advantage of the request. the variation between different transaction round trip times is also significant. the
42
company needs an infrastructure with low variation (or jitter) that can translate into a predictable and fair trading system that delivers the same high-performance experience to all clients. Moreover, the high speed and low variability must be maintained as the transactional applications are deployed across multiple data center locations to support high availability and disaster recovery. any solution must also be easily scalable to allow for continued growth in client population and applications. the company is looking for a data center design that can support the required speed, predictability, low variability, and reliability to successfully implement its trading platforms, algorithmic trading applications, and market data distribution systems. with the inherent complexity of the companys networking infrastructure, it is essential to adopt simplified data center design principles and a simplified and effective management solution. while the companys It organization does have some budgetary flexibility to deploy a comprehensive solution, it must still control total cost of operations to ensure that the major focus remains on delivering the services that contribute directly to the companys bottom line. Requirements: low latency, high-performance network High resiliency network, including l2 extension across data centers operational simplicity Reduced cost of operations Now lets look at the details of these requirements. the transactional data center is less burdened by legacy equipment than a typical enterprise data center, so the data center design can focus on optimizing connectivity in the 10GbE environment with heavy use of virtualization. latency must be minimized at every level, and attention must be paid to the number of links and the amount of traffic redirection. the design must be more predictable and less fluid than in situations where latency and variability are not critical. there is no tolerance for downtime in the transactional data center, so the design must support high availability while reducing complexity (such as spanning tree complexity). the network must meet bandwidth requirements for highperformance compute clusters to reduce application response times and increase network efficiency. l2 connectivity is required to support the transactional applications that are deployed across multiple data centers to meet high availability and disaster recovery requirements. a design that incorporates the operational simplicity of a single network operating system and a standard management interface is necessary to allow the It staff to focus on the performance-related aspects of the data center network without having to continually address multiple operating systems and management interfaces. although the transactional data center is typically not as subject to operational cost considerations as a standard enterprise data center, keeping total costs under control allows the organization to meet the primary data center performance objectives. this can extend to reduction of power and waste in the physical data center environment. Design: Physical device placement. the top of rack design reduces cabling complexity and limits latency of server-to-access device connections. It is recommended to support the 10GbE server environment. Physical network topology. In the financial transaction network, performance and latency are the key requirements. we need to design the network to provide deterministic low latency with high resiliency. logical network topology. the access-core inverse U design provides a pair of access devices, each of which is connected to different core devices for resiliency. the network architect can minimize latency through configuration of active and standby links on a fixed set of access and core devices. this simplifies the network design and provides a deterministic low latency for transaction data flows. QFX3500. the design uses QFX3500 devices to provide connectivity in the 10GbE server environment with nonblocking, ultra low latency network connections. EX8200. the design uses EX8200 high-performance devices to serve as non-blocking core platforms to reduce network congestion, oversubscription from the access tier to core tier network connections, and application response times.
43
MX960. the design uses the MX960 to provide inter-data center connectivity with advanced routing capabilities such as MPlS/VPlS to extend the l2 connection across multiple data center locations. advanced QoS and traffic engineering capabilities are used to meet the performance requirements for data replication and application cluster communication. SRX3600. the SRX3600 high-performance security platform cluster is connected to the MX960 to provide secure access from the waN. Figure 29 shows the enterprise data center design.
Edge
MX Series
Core
EX8200 MX960
Access
QFX3500
Figure 29. use Case: Transactional data center
Summary
with the development of cloud computing, the network has become the cornerstone of the information age infrastructure. Everything now depends on network communicationswithin data center systems, between data centers, and between data centers and users. over the past decade, network complexity has grown exponentially in tandem with the growth in demand for global 24x7x365 services. Distributed processing, logical systems, virtual machines, multitier architectures, and inter-data center communications all place extreme burdens on traditional networking infrastructures, and new approaches are essential to the introduction and success of cloud data centers. as a visionary leader in cloud computing, Juniper Networks is uniquely positioned to deliver the most innovative, comprehensive, efficient, and cost-effective solutions to the challenges faced by cloud data center designers. Designers can select from the complete Juniper family of access and core network devices to create flexible physical layouts and topologies that meet their most exacting requirements. By including Virtual Chassis in the mix, designers can enhance resiliency and high performance, even as they simplify the overall network topology.
44
Resiliency is integral to Juniper data center solutions at all levels, including the application, server, network device, network oS, and physical levels. Designers can choose from a robust set of optimized resiliency designs that operate consistently across all of the Juniper products. Simplicity and modularity are designed into all Juniper solutions. Beginning with individual segments that are optimized for performance and reliability, designers can create data centers of any size and connect multiple data centers for capacity and resiliency. Protocol solutions such as MPlS can be used to interconnect data center network segments within a single data center location or across multiple locations. Finally, Junos oS ties everything together with a common code base, modular design, nondisruptive upgrades, and a growth path that can effectively meet all of the emerging challenges and opportunities of the new world of cloud computing.
About Juniper Networks

Juniper Networks is in the business of network innovation. From devices to data centers, from consumers to cloud providers, Juniper Networks delivers the software, silicon and systems that transform the experience and economics of networking. the company serves customers and partners worldwide. additional information can be found at www.juniper.net.
Corporate and Sales Headquarters Juniper Networks, Inc. 1194 North Mathilda avenue Sunnyvale, Ca 94089 USa Phone: 888.JUNIPER (888.586.4737) or 408.745.2000 Fax: 408.745.2100 www.juniper.net
APAC Headquarters Juniper Networks (Hong kong) 26/F, Cityplaza one 1111 kings Road taikoo Shing, Hong kong Phone: 852.2332.3636 Fax: 852.2574.7803
EMEA Headquarters Juniper Networks Ireland airside Business Park Swords, County Dublin, Ireland Phone: 35.31.8903.600 EMEa Sales: 00800.4586.4737 Fax: 35.31.8903.601
to purchase Juniper Networks solutions, please contact your Juniper Networks representative at 1-866-298-6428 or authorized reseller.
Copyright 2011 Juniper Networks, Inc. all rights reserved. Juniper Networks, the Juniper Networks logo, Junos, NetScreen, and ScreenoS are registered trademarks of Juniper Networks, Inc. in the United States and other countries. all other trademarks, service marks, registered marks, or registered service marks are the property of their respective owners. Juniper Networks assumes no responsibility for any inaccuracies in this document. Juniper Networks reserves the right to change, modify, transfer, or otherwise revise this publication without notice.
8020014-002-EN
Feb 2011
Printed on recycled paper
45

Cloud Ready Data Center Network DESIGN GUIDE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cloud Ready Data Center Network DESIGN GUIDE

Uploaded by

Copyright:

Available Formats

DESIGN GUIDE

CloUD REaDy Data CENtER NEtwoRk DESIGN GUIDE

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

History of the Modern Data Center

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Server Platform Evolution

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Operational Models Evolution

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Types of Data Centers

Transactional Production Data Center Network

Content and Hosting Services Production Data Center Network

High-Performance Compute (HPC) Production Data Center Network

Enterprise IT Data Centers

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Small and Midsize Business IT Data Center

The New Role of the Network

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Top of Rack/Bottom of Rack

Network Devices EX4200/ QFX3500

Compute Storage Devices

Figure 1. Top of rack deployment

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Figure 2. Virtual Chassis in a top of row layout

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Figure 3. Dedicated Virtual Chassis daisy-chained ring

Figure 4. Virtual Chassis braided ring cabling

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Up to 50 Km Dedicated Virtual Chassis

Virtual Chassis Location #2

Gigabit Ethernet or 10-Gigabit Ethernet Virtual Chassis Extension

Figure 5. Extended Virtual Chassis configuration

Network Devices EX8200

Figure 6. End of row deployment

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Figure 7. Middle of row deployment

Cloud Data Center Network Design Guidance

top of rack/ Bottom of rack

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

Physical Network Topologies

Single Tier Topology

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

High Resiliency Switch/Router

EX8200 Virtual Chassis

Figure 8. Single tier network topology

Copyright 2011, Juniper Networks, Inc.

DESIGN GUIDE - Cloud Ready Data Center Network

High Resiliency Switch/Router