Showing posts with label AMD. Show all posts
Showing posts with label AMD. Show all posts

Monday, January 10, 2022

AWS adds EC2 instances for HPC workloads powered by AMD

AWS introduced new Amazon Elastic Compute Cloud (Amazon EC2) Hpc6a instances, powered by 3rd Gen AMD EPYC processors, for tightly coupled high performance computing (HPC) workloads. 

Hpc6a instances promise up to 65% better price performance compared to similar compute-optimized Amazon EC2 instances for HPC workloads and can be scaled to carry out complex calculations across a range of cluster sizes—up to tens of thousands of cores. 

Hpc6a instances are enabled with Elastic Fabric Adapter (EFA)—a network interface for Amazon EC2 instances—by default. With EFA networking, customers benefit from low latency, low jitter, and up to 100 Gbps of EFA networking bandwidth to increase operational efficiency and drive faster time-to-results for workloads that rely on inter-instance communications. Hpc6a instances are powered by 3rd Gen AMD EPYC processors that run at frequencies up to 3.6 GHz and provide 384 GB of memory. Using Hpc6a instances, customers can more cost-effectively tackle their biggest and most difficult academic, scientific, and business problems with HPC, and realize the benefits of AWS with superior price performance.

“By consistently innovating and creating new purpose-built Amazon EC2 instances for virtually every type of workload, AWS customers have realized huge price performance benefits for some of today’s most business-critical applications. While high performance computing has helped solve some of the most difficult problems in science, engineering, and business, effectively running HPC workloads can be cost-prohibitive for many organizations,” said David Brown, Vice President of Amazon EC2 at AWS. “Purpose-built for HPC workloads, Hpc6a instances now help customers realize up to 65% better price performance for their HPC clusters at virtually any scale, so they can focus on solving the biggest problems that matter to them most without the cost barriers that exist today.”

“We are excited to continue our momentum with AWS and provide their customers with this new, powerful instance for high performance computing workloads,” said Dan McNamara, Senior Vice President and General Manager, Server Business at AMD. “AMD EPYC processors are helping customers of all sizes solve some of their biggest and most complex problems. From small universities to enterprises to large research facilities, Hpc6a instances powered by 3rd Gen AMD EPYC processors open up the world of powerful HPC performance with cloud scalability to more customers around the world.”

Monday, November 8, 2021

AMD targets the accelerated data center

At a virtual event ahead of this week's Open Compute Project Summit, Dr. Lisa Su, president and CEO of AMD, introduced the company's Instinct MI200 series accelerators for high performance computing (HPC) and artificial intelligence (AI) workloads, and provided a preview of the 3rd Gen AMD EPYC processors. 

The 3rd Gen AMD EPYC processors with AMD 3D V-Cache, codenamed “Milan-X,” will leverage 3D chiplet packaging technology from TSMC.

AMD also confirmed that Meta will adopt its EPYC processors in its hyperscale data centers supporting Facebook, Instagram and other applications. AMD and Meta worked together to define an open, cloud-scale, single-socket server designed for performance and power efficiency, based on the 3rd Gen EPYC processor.

“We are in a high-performance computing megacycle that is driving demand for more compute to power the services and devices that impact every aspect of our daily lives,” said Dr. Su. “We are building significant momentum in the data center with our leadership product portfolio, including Meta’s adoption of AMD EPYC and the buildout of Frontier, the first U.S. exascale supercomputer which will be powered by EPYC and AMD Instinct processors."

Next up on the AMD roadmap are EPYC processors codenamed “Genoa” and “Bergamo.”

“Genoa” will have up to 96 high-performance “Zen 4” cores produced on optimized 5nm technology, and will support the next generation of memory and I/O technologies with DDR5 and PCIe 5. “Genoa” will also include support for CXL, enabling significant memory expansion capabilities for data center applications. “Genoa” is on track for production and launch in 2022.

“Bergamo” is a high-core count CPU, tailor made for cloud native applications, featuring 128 high performance “Zen 4c” cores. AMD optimized the new “Zen 4c” core for cloud-native computing, tuning the core design for density and increased power efficiency to enable higher core count processors with breakthrough performance per-socket. “Bergamo” comes with all the same software and security features and is socket compatible with “Genoa.” “Bergamo” is on track to ship in the first half of 2023.

Tuesday, March 16, 2021

AMD sees wide partner support for new EPYC server chips

AMD unveiled its EPYC 7003 Series CPUs, claiming the highest performance benchmark for a server processor with up to 19% more instructions per clock. 

The AMD EPYC 7003 Series Processors have up to 64 “Zen 3” cores per processor and introduce new levels of per-core cache memory, while continuing to offer the PCIe 4 connectivity and class-leading memory bandwidth. 3rd Gen AMD EPYC processors also include modern security features through AMD Infinity Guard, supporting a new feature called Secure Encrypted Virtualization-Secure Nested Paging (SEV-SNP). SEV-SNP expands the existing SEV features on EPYC processors, adding strong memory integrity protection capabilities to help prevent malicious hypervisor-based attacks by creating an isolated execution environment. 

“With the launch of our 3rd Gen AMD EPYC processors, we are incredibly excited to deliver the fastest server CPU in the world. These processors extend our data center leadership and help customers solve today’s most complex IT challenges, while substantially growing our ecosystem,” said Forrest Norrod, senior vice president and general manager, Data Center and Embedded Solutions Business Group. “We not only double the performance over the competition in HPC, cloud and enterprise workloads with our newest server CPUs, but together with the AMD Instinct GPUs, we are breaking the exascale barrier in supercomputing and helping to tackle problems that have previously been beyond humanity’s reach.”

AMD also cited broad industry support, including:

  • AWS – will add the AMD EPYC 7003 series processors to its core Amazon EC2 instance families later this year.
  • Cisco – introduced new Cisco Unified Computing System (Cisco UCS) rack server models with AMD EPYC 7003 Series Processors designed to support modern hybrid cloud workloads.   
  • Dell Technologies – announced the all new PowerEdge XE8545 server with AMD EPYC 7003 series CPUs, and the company will support the new processors within its PowerEdge server portfolio.
  • Google Cloud – announced AMD EPYC 7003 series processors will power a new compute optimized VM, C2D, and an expansion of the existing general purpose N2D VM later this year. Google Cloud Confidential Computing will be available on both C2D and N2D.
  • HPE – announced it will double the lineup of AMD EPYC processor powered solutions, using the AMD EPYC 7003 series processors in new HPE ProLiant servers, HPE Apollo systems and HPE Cray EX supercomputers.
  • Lenovo – added ten Lenovo ThinkSystem Servers and ThinkAgile HCI solutions built on 3rd Gen EPYC processors, and achieved more than 25 new world records across a broad set of industry-standard benchmarks in workload areas.
  • Microsoft Azure  – announced multiple new virtual machine offerings powered by AMD EPYC 7003 series processors. Azure HBv3 virtual machines for HPC applications are generally available today, and Confidential Computing virtual machines that utilize the full security features of the new AMD EPYC 7003 series processors are in private preview.
  • Oracle Cloud Infrastructure – announced it is extending its flexible virtual machine and bare metal compute offerings with the new E4 platform based on 3rd Generation AMD EPYC Processors.
  • Supermicro – introduced the AMD EPYC 7003 series processor in its Supermicro A+ single and dual socket family of Ultra, Twin, SuperBlade, Storage and GPU Optimized Systems.
  • Tencent Cloud – announced the new Tencent Cloud SA3 server instance, powered by the 3rd Gen AMD EPYC processors.  
  • VMware – announced its latest release of VMware vSphere 7 which is optimized to take advantage of AMD EPYC processors virtualization performance, while supporting the processors’ advanced security features, including SEV-ES for both virtual machine based and containerized applications.

Tuesday, October 27, 2020

AMD to acquire Xilinx for $35 billion

AMD agreed to acquire Xilinx in an all-stock transaction valued at $35 billion. The acquisition price represents approximately $143 per share of Xilinx common stock.

The deal significantly expands AMD’s product portfolio, which will now cover CPUs and GPUs, with Xilinx's FPGAs, Adaptive SoCs and software expertise. The combined company's addressable market will now include industry growth segments from the data center to gaming, PCs, communications, automotive, industrial, aerospace and defense.

“Our acquisition of Xilinx marks the next leg in our journey to establish AMD as the industry’s high performance computing leader and partner of choice for the largest and most important technology companies in the world,” AMD President and CEO Dr. Lisa Su said. 

“We are excited to join the AMD family. Our shared cultures of innovation, excellence and collaboration make this an ideal combination. Together, we will lead the new era of high performance and adaptive computing,” said Victor Peng, Xilinx president and CEO. “Our leading FPGAs, Adaptive SoCs, accelerator and SmartNIC solutions enable innovation from the cloud, to the edge and end devices. We empower our customers to deploy differentiated platforms to market faster, and with optimal efficiency and performance. Joining together with AMD will help accelerate growth in our data center business and enable us to pursue a broader customer base across more markets.”

Some highlights of the combined company

  • Dr. Lisa Su will lead the combined company as CEO. Xilinx President and CEO 
  • Victor Peng, will join AMD as president responsible for the Xilinx business and strategic growth initiatives
  • 13,000 engineers
  • $2.7 billion of annual1 R&D investment
  • Post-closing, current AMD stockholders will own approximately 74 percent of the combined company Immediately accretive to AMD margins, cash flow and EPS 
  • Combined revenue of $11.6B 

Tuesday, July 28, 2020

AMD's Q2 revenue rises 26% YoY to $1.93 billion

AMD reported Q2 revenue of $1.93 billion, up 26% YoY,  operating income of $173 million, net income of $157 million and diluted earnings per share of $0.13. On a non-GAAP basis, operating income was $233 million, net income was $216 million and diluted earnings per share was $0.18.

"We delivered strong second quarter results, led by record notebook and server processor sales as Ryzen and EPYC revenue more than doubled from a year ago,” said Dr. Lisa Su, AMD president and CEO. "Despite some macroeconomic uncertainty, we are raising our full-year revenue outlook as we enter our next phase of growth driven by the acceleration of our business in multiple markets."

Computing and Graphics segment revenue was $1.37 billion, up 45 percent year-over-year and down 5 percent quarter-over-quarter. Revenue was higher year-over-year driven by strong Ryzen processor sales. The quarter-over-quarter decline was due to lower graphics processor sales.
Client processor average selling price (ASP) was up year-over-year driven by Ryzen processor sales.
Client processor ASP was down quarter-over-quarter due to a higher percentage of Ryzen mobile processor sales.
GPU ASP was lower year-over-year and quarter-over-quarter due to lower channel sales.
Enterprise, Embedded and Semi-Custom segment revenue was $565 million, down 4 percent year-over-year and up 62 percent quarter-over-quarter. Revenue was lower year-over-year due to lower semi-custom product sales largely offset by higher EPYC processor sales. The quarter-over-quarter increase was driven by higher EPYC processor and semi-custom product sales.

Wednesday, August 7, 2019

AMD debuts 2nd Gen EPYC processor, Google and Twitter deploy

AMD unveiled its 2nd Gen EPYC processors for data center servers and boasting up to 64 “Zen 2” cores in leading-edge 7nm process technology. The new processors claim up to 83% better Java application performance, up to 43% better SAP SD 2 Tier performance than the competition and provide world record performance on Real Time Analytics with Hadoop.

For modern cloud and virtualization workloads, 2nd Gen AMD EPYC processors deliver world record virtualization8 performance that redefines datacenter economics.

“Today, we set a new standard for the modern data center with the launch of our 2nd Gen AMD EPYC processors that deliver record-setting performance and significantly lower total cost of ownership across a broad set of workloads,” said Dr. Lisa Su, president and CEO, AMD. “Adoption of our new leadership server processors is accelerating with multiple new enterprise, cloud and HPC customers choosing EPYC processors to meet their most demanding server computing needs.

Google has deployed 2nd Gen AMD EPYC processors in its internal infrastructure production data centers and in late 2019 will support new general-purpose machines powered by 2nd Gen AMD EPYC processors on Google Cloud Compute Engine. Twitter will deploy 2nd Gen AMD EPYC processors across its data centers later this year.  Microsoft announced the preview of new Azure virtual machines for general purpose applications, as well as limited previews of cloud-based remote desktops and HPC workloads based on 2nd Gen AMD EPYC processors.

Mellanox combines Ethernet and InfiniBand with AMD EPYC 7002

Mellanox Technologies has optimized its Ethernet and InfiniBand ConnectX smart adapters for the new AMD EPYC 7002 Series processor-based compute and storage infrastructures.

The 2nd Gen AMD EPYC processors, which supports PCI Express 4.0, offers four times peak FLOPS per-socket performance over the AMD EPYC 7001 series processor. The large number of PCI Express 4.0 lanes enables direct connectivity to 24 NVMe storage drives plus Mellanox ConnectX 100 and 200 gigabit per second adapters and achieve full I/O throughout.

“The combination of Mellanox 25, 50, 100 and 200 Gigabit Ethernet and HDR 200 Gigabit InfiniBand adapters, and PCI Express 4.0 support in the second-generation AMD EPYC processor, provides high-performance computing, artificial intelligence, cloud and enterprise data centers the high data bandwidth they need for the most compute and storage demanding applications,” said Michael Kagan, Chief Technology Officer at Mellanox Technologies. “By leveraging our smart acceleration engines for In-Network Computing, virtualization, storage and security, our partners and customers can maximize the performance capabilities of the new second generation EPYC™ processors-based platforms.”

“Driven by AMD’s history of datacenter innovation, including 7nm process technology, the first x86 supplier to support PCIe 4.0, and embedded security features, the second generation AMD EPYC Processors set a new standard for the modern datacenter,” said Scott Aylor, corporate vice president and general manager, Datacenter Solutions Group at AMD. “We’re excited and thankful to have our partners, like Mellanox, supporting the launch of the second generation AMD EPYC processor. Working together we can enable our customers to transform their data center operations and deliver the breakthrough performance they need.”

Monday, November 6, 2017

Intel partners with AMD on Embedded Multi-Die Interconnect Bridge for GPUs

Intel announced a partnership with AMD to tie together its high-performance processors with discrete graphics processors using the Intel Embedded Multi-Die Interconnect Bridge (EMIB) technology along with a new power-sharing framework.
The goal is to reduce the usual silicon footprint to less than half that of standard discrete components on a motherboard.

The first implementation matches the new 8th Gen Intel Core Core H-series processor, second generation High Bandwidth Memory (HBM2) and a custom-to-Intel third-party discrete graphics chip from AMD’s Radeon Technologies Group – all in a single processor package.

“Our collaboration with Intel expands the installed base for AMD Radeon GPUs and brings to market a differentiated solution for high-performance graphics,” said Scott Herkelman, vice president and general manager, AMD Radeon Technologies Group. “Together we are offering gamers and content creators the opportunity to have a thinner-and-lighter PC capable of delivering discrete performance-tier graphics experiences in AAA games and content creation applications. This new semi-custom GPU puts the performance and capabilities of Radeon graphics into the hands of an expanded set of enthusiasts who want the best visual experience possible.”

Monday, December 12, 2016

AMD Outlines Strategy for GPU-Accelerated Machine Learning

AMD outlined its strategy to accelerate the machine intelligence era in server computing through a new suite of hardware and open-source software offerings. Plans include new AMD Radeon Instinct GPU accelerators for deep learning inference and training.

Along with the new hardware offerings, AMD announced MIOpen, a free, open-source library for GPU accelerators intended to enable high-performance machine intelligence implementations, and new, optimized deep learning frameworks on AMD's ROCm software to build the foundation of the next evolution of machine intelligence workloads.

"Radeon Instinct is set to dramatically advance the pace of machine intelligence through an approach built on high-performance GPU accelerators, and free, open-source software in MIOpen and ROCm," said AMD President and CEO, Dr. Lisa Su. "With the combination of our high-performance compute and graphics capabilities and the strength of our multi-generational roadmap, we are the only company with the GPU and x86 silicon expertise to address the broad needs of the datacenter and help advance the proliferation of machine intelligence."

Sunday, October 16, 2016

Aliyun Looks to AMD for Cloud-based GPUs

AMD and Alibaba Cloud (Aliyun) announced a collaboration to strengthen research and cooperation related to the use of AMD Radeon Pro GPU technology in Alibaba Cloud’s global data centers.

“The partnership between AMD and Alibaba Cloud will bring both of our customers more diversified, cloud-based graphic processing solutions. It is our vision to work together with leading technology firms like AMD to empower businesses in every industry with cutting-edge technologies and computing capabilities,” said Simon Hu, president of Alibaba Cloud.

“The collaboration between AMD and Alibaba Cloud leverages the world-class technology and software engineering capabilities of both companies to meet the growing demand for standards-based GPU computing solutions capable of enabling more immersive and intuitive cloud services,” said AMD President and CEO Dr. Lisa Su. “Working closely with industry leaders like Alibaba Cloud helps ensure the investments AMD is making in our high-performance graphics and computing datacenter products continue to align with the needs of the broader cloud market.”

At this week's Computing Conference in Hangzhou, China, AMD is conducting the following demos:

  • An Alibaba Cloud Single Root Input/Output Virtualization (SR-IOV) Solution featuring AMD Radeon Pro server technology. The demo is powered by the Radeon FirePro™ S7150 x2 GPU featuring AMD Multi-user GPU (MxGPU) hardware-based server virtualization technology. The solution features the industry’s only hardware-virtualized GPU technology, which provides guaranteed service levels and improves security for remote workstation, cloud gaming, cloud computing, and Virtual Desktop Infrastructure (VDI) implementations.
  • A virtual reality (VR) experience demo powered by AMD Radeon VR Ready Premium graphics featuring AMD’s powerful, energy efficient Polaris graphics architecture.

Wednesday, June 3, 2015

Blueprint: Enabling Smart Software Defined Networks

by Seong Kim, System Architect in AMD’s Embedded Networking Division

The networking and communications industry is at a critical inflection point as it looks to embrace new technologies such as software-defined networking (SDN) and network function virtualization (NFV). While there are significant advantages to deploying a software-defined network, there are challenges as well. The implementation of SDN and NFV requires revamping network components and structures, and adopting new approaches to writing software for network management function.

The hosting of SDN and NFV middleware and network management software on industry-standard processors is now being handled by modern multi-processor heterogeneous system architectures that incorporate both CPU and GPU resources within a single SOC.

What’s been missing until recently is a holistic view of networks and the technology providing a standardized separation of the control and data planes. SDN provides this capability, and can efficiently enable data center and service providers to manage network configuration, management, routing and policy enforcement for their evolving multi-tenant heterogeneous networks.

As defined by the Open Networking Foundation, SDN decouples the network control and forwarding functions, enabling the network control to become directly programmable and the underlying infrastructure to be abstracted for applications and network services.
Unlike server virtualization, which enables sharing of a single physical resource by many users or entities, virtualizing network resources enables a consolidation of different physical resources by overlaying virtual layers of networks on heterogeneous networks, resulting in a unified, logically homogenous network. Figure 1 describes three requirements that commonly define SDN architecture.

SDN Trends and Challenges

There are several different SDN deployment scenarios in the industry, although the original SDN concept proposes to have a centralized control plane with only the data plane remaining in the network.

On the controller implementation, three basic topologies are being considered in the industry. The first is a centralized topology where one SDN controller controls all the switches in the network. This approach, however, incurs a higher risk of failure since it makes the central controller a single point of failure for the network. The second topology being investigated is the so-called distributed-centralized architecture. In this approach multiple “regional” SDN controllers, each controlling a subset of the network, communicate with the (global) central controller. This architecture eliminates single points of failure since one controller can take over the function of a failed controller. Finally, Orion  proposes a hierarchical topology that may provide better network scalability.

Apart from the controller, the data plane can also become a challenge with the transition to SDN, because traditional switching and/or forwarding devices/ASICs will not be able to easily support SDN traffic due to evolving standards. Hence the need to have a hybrid approach. Specifically, a portion of the network (e.g., the access network) can be SDN enabled while the other portion (e.g., the core network) can remain as a ‘traditional’ network . Thus traditional platforms are located in the intermediate nodes, acting as a big pipe, and SDN-enabled platforms serve as the switch and routing platforms. With this approach, an SDN network may be enabled immediately without requiring the overhaul of the entire network.

Challenges in SDN are still emerging, as the definition of SDN continues to evolve. The scale-out network paradigm is evolving as well. Due to these uncertainties, abstraction mechanisms from different vendors will compete or co-exist. In addition, creation of SDN controllers and switches requires resolution of design challenges in many hardware platforms.

The data center environment is the most common use case for SDN. In the traditional data center network, there are ToR (Top of Rack), EoR (End of Row), aggregation and core switches. Multi-tier networking is a common configuration. To increase data center network manageability, SDN can abstract physical elements and represent them as logical elements using software. It treats all network elements as one large resource across multiple network segments. Therefore it can provide complete visibility of the network and manage policies across network nodes connected to virtual and physical switches.

Figure 2 shows a traditional multi-tier data center network and how an SDN controller can manage the entire network from a centralized location.

SDN’s basic tenet is to remove vendor-specific dependencies, reduce complexity and improve control, allowing the network to quickly adapt to changes in business needs. Other key SDN requirements are the disaggregation of control and data planes, and the integration of strong compute and packet processing capabilities. Companies are now collaborating to demonstrate the feasibility of a complete SDN solution utilizing the unique compute capabilities and power efficiency of heterogeneous, general purpose processors.

Software Enablement for SDN

One such demonstration of the integration needed to enable SDN is an ETSI NFV proof-of-concept. In this proof of concept, several companies demonstrated the integration of a Data Plane Development Kit (DPDK) on an x86 platform and Open Data Plane (ODP) on an ARM-based platform running OpenStack. The DPDK and ODP middleware enables fast packet I/O for general purpose CPU platforms eliminating the typical bottleneck in the data path when there is no user space pass-through enablement. This middleware software is a must-have to enable an SDN solution, providing a unified interface to various platforms including x86 and ARM64 platforms.

High Compute Power at a Low Power Envelope

An SDN controller needs to have strong compute capability to handle large amounts of control traffic coming from many SDN switches – each individual flow needs handling by the central SDN controller. This brings concerns regarding the SDN controller in terms of performance and single point of failure.

There are different architectures proposed in the industry to mitigate the load on the central controller. One example is a distributed-centralized controller which has several SDN controllers, each managing a subsection of the network, with an additional control layer managing these regional controllers. This architecture requires smart, distributed and powerful compute capabilities throughout the entire network of SDN controllers. Different nodes, including SDN switch nodes, require different levels of performance and power. SDN implementations benefit from vendor platforms that offer a range of performance capabilities, matching the appropriate level of resources at the necessary point in the network design.

Security Enhancements

There is a growing need for security, and as the amount of control traffic increases, the needs of crypto acceleration or offload increase together. By offloading crypto operation to acceleration engines such as CCP (Crypto Co-processor) on a CPU or GPU, the system level performance can be maintained without compromising compute performance.

Deep Packet Inspection (DPI) - Understanting Network Traffic Flow

In order for an SDN controller to manage the network and associated policies, it requires a good ‘understanding’ of networking traffic. Centralized or distributed SDN architectures can support a deep understanding of traffic by collecting sets of packets from a traffic flow and analyzing them. There are two different ways to support this requirement.

Option 1—Based on the assumption of having a big pipe/channel between SDN switches and SDN controller, all of the deep packet inspection or application recognition can be done in the central controller with a powerful DPI engine.

Option 2—A small DPI engine can be implemented in the distributed SDN switches. These switches perform a basic deep packet inspection, then report the results or send only streams of important traffic. As we have seen, the latter case requires cheaper and simpler implementation to meet the basic SDN tenet.

Low cost and low power processors can be used for DPI applications. The combination of CPUs and GPUs as found in heterogeneous architectures, the latter being highly optimized for highly parallel programmable applications, provides a significant performance advantage.

I/O Integration

The main processor for SDN requires high speed I/O interfaces, for example, embedded network interfaces such as 1G, 10GE, and PCIe. This can lower system cost and ease system design complexity.


Complicating the development of new SDN solutions is the continuing evolution of standards. Throughout the industry, there are different approaches to enabling network virtualization (for example, VXLAN and NVGRE), and these standards continue to evolve as they move to the next phases. In order to meet the requirements of these evolving standards – and any emerging network overlaying protocols – platforms must be able to provide flexibility and ease of programmability. As an example, the transition from the OpenFlow1.0 spec to the OpenFlow revision 1.3 significantly increased complexity as it aimed to support many types of networking functions and protocols.

Platform Needs

Modern heterogeneous compute platforms contain the following three major function blocks:
General purpose, programmable scalar (CPU) and vector processing cores (GPU)
High-performance bus
Common, low-latency memory model

Leading heterogeneous designs are critical to maximizing throughput. For example, on AMD platforms incorporating Heterogeneous Systems Architecture (HSA), the CPU hands over only the pointers of the data blocks to the GPU. The GPU takes the pointers and processes the data block in the specific memory location and hands them back to the CPU. HSA ensures cache coherency between the CPU and the GPU. Figure 3 depicts an overview of this architecture.

 GPUs are extremely efficient for parallel processing applications, and they can also be used for crypto operations, DPI, classification, compression and other applications. In the case of crypto operations, the CPU doesn’t have to get involved in the data plane crypto operation directly. With this architecture, the system level performance can be maintained even when the amount of traffic needing encryption or decryption increases. In a heterogeneous capable processor, software can selectively accelerate or offload CPU compute-intensive operations to the GPU. Here are a few additional functions that can be accelerated or offloaded to the GPU:

DPI: Implement RegEx engine
Security (such as IPSec) operations: RSA, crypto operation
Compression operation for distributed storage applications

 Figure  4 shows a number of different networking use cases and examples of where different levels of embedded processors integrate into the solution.


SDN introduces a new approach to network resource utilization and management, and each networking vendor in the market is looking for its own way to build SDN solutions. One key action that needs to be taken to enable SDN is to open up the intelligence of switches and routers to enable the abstraction of proprietary vendor technologies.

Mega data center players (Amazon, Google, Facebook and the like) are implementing technologies that will allow them to enable greater flexibility and lower costs. Amazon and Google are building their own networking (white box) switches so that they don’t have to rely on the platforms produced by OEM vendors. Facebook is driving the Open Compute Platform (OCP) to develop specifications for open architecture switches that will be manufactured by low-cost original design manufacturers (ODMs) . The open architecture approach from Facebook is creating an ecosystem where standard, high volume commodity platforms could be used to minimize CAPEX and OPEX costs.

SDN will drive the industry toward a more software-centric architecture and implementation. Thus, in this environment, OEMs find it more difficult to provide platform differentiators. With SDN, the need for less expensive and easy-to-access hardware becomes paramount, and platform-specific, value-added services is deprioritized.

About the Author

Seong Kim is currently a system architect in AMD’s Embedded Networking Division. He has more than 15 years of experience in networking systems architecture and technical marketing. His recent initiatives include NFV, SDN, Server virtualization, wireless communication networking, and security and threat management solutions. Dr. Kim’s work has been published in numerous publications including IEEE communications and Elsevier magazines, and has presented at several industry conferences and webinars. He has several US patents and US patents pending in the field of networking. Kim holds a Ph.D. in Electrical Engineering from State University of New York (SBU) and an M.B.A degree from Lehigh University.

Got an idea for a Blueprint column?  We welcome your ideas on next gen network architecture.
See our guidelines.

Monday, June 23, 2014

NVIDIA and Partners Develop GPU-accelerated ARM64 Servers for HPC

NVIDIA is seeing progress in leverage its GPU accelerators in supercomputers.  Multiple server vendors are now developing 64-bit ARM development systems integrating its NVIDIA GPU processors for high performance computing (HPC).

The new ARM64 servers feature Applied Micro X-GeneARM64 CPUs and NVIDIA Tesla K20 GPU accelerators.  The systems use the hundreds of existing CUDA-accelerated scientific and engineering HPC applications by simply recompiling them to ARM64 systems.

The first GPU-accelerated ARM64 development platforms will be available in July from Cirrascale Corp. and E4 Computer Engineering, with production systems expected to ship later this year. The Eurotech Group also plans to ship production systems later this year. System details include:

  • Cirrascale RM1905D - High-density two-in-one 1U server with two Tesla K20 GPU accelerators; provides high-performance, low total cost of ownership for private cloud, public cloud, HPC, and enterprise applications.
  • E4 EK003 - Production-ready, low-power 3U, dual-motherboard server appliance with two Tesla K20 GPU accelerators, designed for seismic, signal and image processing, video analytics, track analysis, web applications and MapReduce processing. 
  • Eurotech - Ultra-high density, energy efficient and modular Aurora HPC server configuration, based on proprietary Brick Technology and featuring direct hot liquid cooling.

"We aim to leverage the latest technology advances, both within and beyond the HPC market, to move science forward in entirely new ways," said Pat McCormick, senior scientist at Los Alamos National Laboratory. "We are working with NVIDIA to explore how we can unite GPU acceleration with novel technologies like ARM to drive new levels of scientific discovery and innovation."

Thursday, May 22, 2014

AMD’s SeaMicro Server Sets OpenStack Record: 168,000 Virtual Machines​​

AMD has achieved record scalability of 168,000 virtual machines on 576 physical hosts, all provisioned on its SeaMicro SM15000 server. AMD said the first 75,000 virtual machines were deployed in six hours and thirty minutes.  The record demonstration was achieved in collaboration with Canonical using the Ubuntu OpenStack (Icehouse) distribution. MaaS (Metal as a Service), part of Ubuntu 14.04 LTS and Ubuntu OpenStack, was used to deliver the bare metal servers, storage and networking.

AMD’s SeaMicro SM15000 is a high-density, 10 rack unit system that links 512 compute cores, 160 gigabits of I/O networking and more than five petabytes of storage with a 1.28 terabyte high-performance supercompute fabric.  The SM15000 server design eliminates top-of-rack switches, terminal servers and hundreds of cables.  It currently supports the next-generation AMD Opteron (Piledriver core) processor, Intel Xeon E3-1260L (Sandy Bridge), E3-1265Lv2 (Ivy Bridge), E3-1265Lv3 (Haswell) and Intel Atom N570 processors.

“This record validates that the SeaMicro SM15000 is well suited for massive OpenStack deployments,” said Dhiraj Mallick, corporate vice president and general manager, AMD Data Center Server Solutions. “The combination of Ubuntu OpenStack and the SeaMicro SM15000 server provides the industry’s leading solution to build cloud infrastructure that is highly responsive and ideal for on-demand services.”

Wednesday, January 29, 2014

AMD Readies 4 and 8-core ARM-based Processors

AMD showcased a development platform for its first 64-bit ARM-based server CPU and announced the upcoming sampling of the ARM-based processor, named the AMD Opteron A1100 Series.

The AMD Opteron A1100 Series processors support:

  • 4 or 8 core ARM Cortex-A57 processors
  • Up to 4 MB of shared L2 and 8 MB of shared L3 cache
  • Configurable dual DDR3 or DDR4 memory channels with ECC at up to 1866 MT/second
  • Up to 4 SODIMM, UDIMM or RDIMMs
  • 8 lanes of PCI-Express Gen 3 I/O
  • 8 Serial ATA 3 ports
  • 2 10 Gigabit Ethernet ports
  • ARM TrustZone technology for enhanced security
  • Crypto and data compression co-processors

"The needs of the data center are changing.  A one-size-fits-all approach typically limits efficiency and results in higher-cost solutions,” said Suresh Gopalakrishnan, corporate vice president and general manager of the AMD server business unit.  “The new ARM-based AMD Opteron A-Series processor brings the experience and technology portfolio of an established server processor vendor to the ARM ecosystem and provides the ideal complement to our established AMD Opteron x86 server processors."

Thursday, January 23, 2014

AMD Adds 12- and 16-core "Warsaw" Opteron Server Processors

AMD released new 12- and 16-core AMD Opteron 6300 Series server processors, code named “Warsaw,” designed for virtualized enterprise workloads.

The new AMD Opteron 6300 Series processors feature the "Piledriver" core and are fully socket and software compatible with the existing AMD Opteron 6300 Series. The company said the power efficiency and cost effectiveness of the new processors make them a good fit for the AMD Open 3.0 Open Compute Platform.

“With the continued move to virtualized environments for more efficient server utilization, more and more workloads are limited by memory capacity and I/O bandwidth,” said Suresh Gopalakrishnan, corporate vice president and general manager, Server Business Unit, AMD. “The Opteron 6338P and 6370P processors are server CPUs optimized to deliver improved performance per-watt for virtualized private cloud deployments with less power and at lower cost points.”

In May 2013, AMD introduced its Opteron X-Series x86 processors for scale-out server architectures.

The first AMD Opteron X-Series processors, formerly known as “Kyoto,” will come in two variants:
  • The AMD Opteron X2150, which consumes as little as 11 watts, is the first server APU system-on-a-chip integrating CPU and GPU engines with a high-speed bus on a single die. It incorporates AMD Radeon HD 8000 graphics technology for multimedia-oriented server workloads.
  • The AMD Opteron X1150, which consumes as little as 9 watts, is a CPU-only version optimized for general scale-out workloads.

“The data center is at an inflection point and requires a high number of cores in a dense form factor with integrated graphics, massive amounts of DRAM and unprecedented power efficiency to keep up with the pace of innovation of Internet services,” said Andrew Feldman, corporate vice president and general manager, Server Business Unit at AMD. 

Wednesday, May 29, 2013

AMD Debuts Opteron Processors for Scale-out Data Center Servers

AMD introduced its Opteron X-Series x86 processors for scale-out server architectures.

The first AMD Opteron X-Series processors, formerly known as “Kyoto,” will come in two variants:

  • The AMD Opteron X2150, which consumes as little as 11 watts, is the first server APU system-on-a-chip integrating CPU and GPU engines with a high-speed bus on a single die. It incorporates AMD Radeon HD 8000 graphics technology for multimedia-oriented server workloads.
  • The AMD Opteron X1150, which consumes as little as 9 watts, is a CPU-only version optimized for general scale-out workloads.

“The data center is at an inflection point and requires a high number of cores in a dense form factor with integrated graphics, massive amounts of DRAM and unprecedented power efficiency to keep up with the pace of innovation of Internet services,” said Andrew Feldman, corporate vice president and general manager, Server Business Unit at AMD.

“Fundamental changes in computing architectures are required to support space, power and cost demands organizations need to deliver compelling, new infrastructure economics,” said Paul Santeler, vice president and general manager, Hyperscale Server business segment, HP. “The new x86 AMD Opteron X-Series processors integrated into future HP Moonshot servers will continue to push the boundaries of power efficiency for social, mobile, cloud and big data workloads."