updated.
The advantages of a “glueless” architecture:
- no requirement for specific development nor expertise from the server manufacturer. Every server makers can build a 8-socket server.
- thus the cost of a 4-socket and 8-socket is also less
The disadvantages of a “glueless” architecture:
- the TCO goes up when scaling out
- limited to 8-socket servers
- difficult to maintain cache coherency when socket increases
- performance increase not linear
- price/performance ratio decreases
- efficiency not optimal when running large VMs
- up to 65% of Intel QPI links bandwidth consumed to address QPI source broadcast snoopy protocol
The primary workloads concerned by the Intel QPI source broadcast snoopy issue are:
- Java applications
- large databases
- latency sensitive applications
Useful reading about snoopy protocols(MESI/MESIF/QPI):
Glued architecture
BCS2 provides 7 XQPI links to connect to up 7 others modules in order to build a maximum of 16-socket system
Bandwidth :
• 1 XQPI link : 14 GT/s each direction
• 1 Transfer = 2 bytes => 14 GT/s = 224 Gb/s
• Transfer rate between 2 modules:
• 4 sockets (4 XQPI links) : equivalent to ~88x 10 GigE ports
• 8 sockets (2 XQPI links) : equivalent to ~44x 10 GigE ports
• 16 sockets (1 XQPI link) : equivalent to ~22x 10 GigE ports
2) HPE Superdome X sx3000 crossbar - up to 8 hops, 486 ns
2) HPE Superdome X sx3000 crossbar - up to 8 hops, 486 ns
3)SGI NUMAlink 7 - up to 500ns
4)Huawei KunLun - NCM > 2 hops
5) Ubox XNC - eXternal Node Controller/UNC - 3rd generation of BCS - Bullion Sequana S1600 - S3200 . The UBox is a 5U chassis imbedding several
UPI Node Controllers (UNC). The UNC is the
6th generation of eXternal Node Controller
(XNC) designed and developed by Atos for
Intel processor-based servers. It is a VLSI-type
(Very Large-Scale Integration) integrated
circuit derived from mainframe technologies
and tuned for High Performance Computing.
This innovative and unique Atos technology
makes it possible to interconnect up to
sixteen 2-socket modules allowing to go
up to 32-socket SMP systems in a Cache
Coherent Non-Uniform Memory Access (CCNUMA) architecture.
up to 8 nodes - classic Intel glueless -
up to 8 nodes - classic Intel glueless -
8+ nodes
Advantages:
- with Ubox - 4+ socket server with Intel Xeon Gold processors
- transparent CascadeLake support.
2 hops in 16-socket system Bullion Sequana S1600, full bandwidth.
Topology: full mesh
Frankly, it is not definetely 3rd generation of BCS. It is third-party solution, product of Numascale.
Core technologies: SCI - Scalable Coherent Interconnect.
https://www.numascale.com/index.php/scale-up-servers/
To meet customer application requirements, 2 types of UBox models can be proposed:
• Enterprise: this is the standard configuration providing all-to-all topology between CPUs. It provides both the performance and the high availability needed for high memory demanding applications like SAP HANA.
• High Performance: well suited for High Performance Computing, doubling the bandwidth in the all-to-all topology between CPUs. It provides exceptional performance for intensive CPU workload. The UBox is autonomous in term of power, cooling and local management.
Entreprise mode:
High performance mode:
6) HPE Superdome Flex/Numalink8 - up to 32sockets - 400ns
Advantages - 4+ socket server with Intel Xeon Gold processors
210 GB/s of bi-sectioned crossbar bandwidth at 8-sockets
425+ GB/s at 16-sockets
850+ GB/s at 32-sockets
Disadvantages - no transparent CascadeLake support.
Topology: full mesh
Frankly, it is not definetely 3rd generation of BCS. It is third-party solution, product of Numascale.
Core technologies: SCI - Scalable Coherent Interconnect.
https://www.numascale.com/index.php/scale-up-servers/
Single socket, single rail
Dual socket, single rail
Dual socket, single rail
Quad socket
Single Chassis
Dual Chassis
Quad Chassis Topology
Eight Chassis Topology
8 Sockets, Double and Single data planes
To meet customer application requirements, 2 types of UBox models can be proposed:
• Enterprise: this is the standard configuration providing all-to-all topology between CPUs. It provides both the performance and the high availability needed for high memory demanding applications like SAP HANA.
• High Performance: well suited for High Performance Computing, doubling the bandwidth in the all-to-all topology between CPUs. It provides exceptional performance for intensive CPU workload. The UBox is autonomous in term of power, cooling and local management.
Entreprise mode:
High performance mode:
6) HPE Superdome Flex/Numalink8 - up to 32sockets - 400ns
Advantages - 4+ socket server with Intel Xeon Gold processors
210 GB/s of bi-sectioned crossbar bandwidth at 8-sockets
425+ GB/s at 16-sockets
850+ GB/s at 32-sockets
Disadvantages - no transparent CascadeLake support.
Glueless architecture
5)Intel Xeon scalable 2-8 sockets topologies - up to 2 hops, with affected bandwidth.
6) Lenovo 8-socket topology - up to 2 hops, with affected bandwidth.
Комментариев нет:
Отправить комментарий