Showing posts with label AMD. Show all posts
Showing posts with label AMD. Show all posts

Monday, November 6, 2017

Intel partners with AMD on Embedded Multi-Die Interconnect Bridge for GPUs

Intel announced a partnership with AMD to tie together its high-performance processors with discrete graphics processors using the Intel Embedded Multi-Die Interconnect Bridge (EMIB) technology along with a new power-sharing framework.
The goal is to reduce the usual silicon footprint to less than half that of standard discrete components on a motherboard.

The first implementation matches the new 8th Gen Intel Core Core H-series processor, second generation High Bandwidth Memory (HBM2) and a custom-to-Intel third-party discrete graphics chip from AMD’s Radeon Technologies Group – all in a single processor package.

“Our collaboration with Intel expands the installed base for AMD Radeon GPUs and brings to market a differentiated solution for high-performance graphics,” said Scott Herkelman, vice president and general manager, AMD Radeon Technologies Group. “Together we are offering gamers and content creators the opportunity to have a thinner-and-lighter PC capable of delivering discrete performance-tier graphics experiences in AAA games and content creation applications. This new semi-custom GPU puts the performance and capabilities of Radeon graphics into the hands of an expanded set of enthusiasts who want the best visual experience possible.”

Monday, December 12, 2016

AMD Outlines Strategy for GPU-Accelerated Machine Learning

AMD outlined its strategy to accelerate the machine intelligence era in server computing through a new suite of hardware and open-source software offerings. Plans include new AMD Radeon Instinct GPU accelerators for deep learning inference and training.

Along with the new hardware offerings, AMD announced MIOpen, a free, open-source library for GPU accelerators intended to enable high-performance machine intelligence implementations, and new, optimized deep learning frameworks on AMD's ROCm software to build the foundation of the next evolution of machine intelligence workloads.

"Radeon Instinct is set to dramatically advance the pace of machine intelligence through an approach built on high-performance GPU accelerators, and free, open-source software in MIOpen and ROCm," said AMD President and CEO, Dr. Lisa Su. "With the combination of our high-performance compute and graphics capabilities and the strength of our multi-generational roadmap, we are the only company with the GPU and x86 silicon expertise to address the broad needs of the datacenter and help advance the proliferation of machine intelligence."

http://www.amd.com

Sunday, October 16, 2016

Aliyun Looks to AMD for Cloud-based GPUs

AMD and Alibaba Cloud (Aliyun) announced a collaboration to strengthen research and cooperation related to the use of AMD Radeon Pro GPU technology in Alibaba Cloud’s global data centers.

“The partnership between AMD and Alibaba Cloud will bring both of our customers more diversified, cloud-based graphic processing solutions. It is our vision to work together with leading technology firms like AMD to empower businesses in every industry with cutting-edge technologies and computing capabilities,” said Simon Hu, president of Alibaba Cloud.

“The collaboration between AMD and Alibaba Cloud leverages the world-class technology and software engineering capabilities of both companies to meet the growing demand for standards-based GPU computing solutions capable of enabling more immersive and intuitive cloud services,” said AMD President and CEO Dr. Lisa Su. “Working closely with industry leaders like Alibaba Cloud helps ensure the investments AMD is making in our high-performance graphics and computing datacenter products continue to align with the needs of the broader cloud market.”

At this week's Computing Conference in Hangzhou, China, AMD is conducting the following demos:


  • An Alibaba Cloud Single Root Input/Output Virtualization (SR-IOV) Solution featuring AMD Radeon Pro server technology. The demo is powered by the Radeon FirePro™ S7150 x2 GPU featuring AMD Multi-user GPU (MxGPU) hardware-based server virtualization technology. The solution features the industry’s only hardware-virtualized GPU technology, which provides guaranteed service levels and improves security for remote workstation, cloud gaming, cloud computing, and Virtual Desktop Infrastructure (VDI) implementations.
  • A virtual reality (VR) experience demo powered by AMD Radeon VR Ready Premium graphics featuring AMD’s powerful, energy efficient Polaris graphics architecture.


http://www.amd.com

Wednesday, June 3, 2015

Blueprint: Enabling Smart Software Defined Networks

by Seong Kim, System Architect in AMD’s Embedded Networking Division

The networking and communications industry is at a critical inflection point as it looks to embrace new technologies such as software-defined networking (SDN) and network function virtualization (NFV). While there are significant advantages to deploying a software-defined network, there are challenges as well. The implementation of SDN and NFV requires revamping network components and structures, and adopting new approaches to writing software for network management function.

The hosting of SDN and NFV middleware and network management software on industry-standard processors is now being handled by modern multi-processor heterogeneous system architectures that incorporate both CPU and GPU resources within a single SOC.

What’s been missing until recently is a holistic view of networks and the technology providing a standardized separation of the control and data planes. SDN provides this capability, and can efficiently enable data center and service providers to manage network configuration, management, routing and policy enforcement for their evolving multi-tenant heterogeneous networks.

As defined by the Open Networking Foundation, SDN decouples the network control and forwarding functions, enabling the network control to become directly programmable and the underlying infrastructure to be abstracted for applications and network services.
Unlike server virtualization, which enables sharing of a single physical resource by many users or entities, virtualizing network resources enables a consolidation of different physical resources by overlaying virtual layers of networks on heterogeneous networks, resulting in a unified, logically homogenous network. Figure 1 describes three requirements that commonly define SDN architecture.

SDN Trends and Challenges

There are several different SDN deployment scenarios in the industry, although the original SDN concept proposes to have a centralized control plane with only the data plane remaining in the network.

On the controller implementation, three basic topologies are being considered in the industry. The first is a centralized topology where one SDN controller controls all the switches in the network. This approach, however, incurs a higher risk of failure since it makes the central controller a single point of failure for the network. The second topology being investigated is the so-called distributed-centralized architecture. In this approach multiple “regional” SDN controllers, each controlling a subset of the network, communicate with the (global) central controller. This architecture eliminates single points of failure since one controller can take over the function of a failed controller. Finally, Orion  proposes a hierarchical topology that may provide better network scalability.

Apart from the controller, the data plane can also become a challenge with the transition to SDN, because traditional switching and/or forwarding devices/ASICs will not be able to easily support SDN traffic due to evolving standards. Hence the need to have a hybrid approach. Specifically, a portion of the network (e.g., the access network) can be SDN enabled while the other portion (e.g., the core network) can remain as a ‘traditional’ network . Thus traditional platforms are located in the intermediate nodes, acting as a big pipe, and SDN-enabled platforms serve as the switch and routing platforms. With this approach, an SDN network may be enabled immediately without requiring the overhaul of the entire network.

Challenges in SDN are still emerging, as the definition of SDN continues to evolve. The scale-out network paradigm is evolving as well. Due to these uncertainties, abstraction mechanisms from different vendors will compete or co-exist. In addition, creation of SDN controllers and switches requires resolution of design challenges in many hardware platforms.

The data center environment is the most common use case for SDN. In the traditional data center network, there are ToR (Top of Rack), EoR (End of Row), aggregation and core switches. Multi-tier networking is a common configuration. To increase data center network manageability, SDN can abstract physical elements and represent them as logical elements using software. It treats all network elements as one large resource across multiple network segments. Therefore it can provide complete visibility of the network and manage policies across network nodes connected to virtual and physical switches.

Figure 2 shows a traditional multi-tier data center network and how an SDN controller can manage the entire network from a centralized location.

SDN’s basic tenet is to remove vendor-specific dependencies, reduce complexity and improve control, allowing the network to quickly adapt to changes in business needs. Other key SDN requirements are the disaggregation of control and data planes, and the integration of strong compute and packet processing capabilities. Companies are now collaborating to demonstrate the feasibility of a complete SDN solution utilizing the unique compute capabilities and power efficiency of heterogeneous, general purpose processors.

Software Enablement for SDN

One such demonstration of the integration needed to enable SDN is an ETSI NFV proof-of-concept. In this proof of concept, several companies demonstrated the integration of a Data Plane Development Kit (DPDK) on an x86 platform and Open Data Plane (ODP) on an ARM-based platform running OpenStack. The DPDK and ODP middleware enables fast packet I/O for general purpose CPU platforms eliminating the typical bottleneck in the data path when there is no user space pass-through enablement. This middleware software is a must-have to enable an SDN solution, providing a unified interface to various platforms including x86 and ARM64 platforms.

High Compute Power at a Low Power Envelope

An SDN controller needs to have strong compute capability to handle large amounts of control traffic coming from many SDN switches – each individual flow needs handling by the central SDN controller. This brings concerns regarding the SDN controller in terms of performance and single point of failure.

There are different architectures proposed in the industry to mitigate the load on the central controller. One example is a distributed-centralized controller which has several SDN controllers, each managing a subsection of the network, with an additional control layer managing these regional controllers. This architecture requires smart, distributed and powerful compute capabilities throughout the entire network of SDN controllers. Different nodes, including SDN switch nodes, require different levels of performance and power. SDN implementations benefit from vendor platforms that offer a range of performance capabilities, matching the appropriate level of resources at the necessary point in the network design.

Security Enhancements

There is a growing need for security, and as the amount of control traffic increases, the needs of crypto acceleration or offload increase together. By offloading crypto operation to acceleration engines such as CCP (Crypto Co-processor) on a CPU or GPU, the system level performance can be maintained without compromising compute performance.

Deep Packet Inspection (DPI) - Understanting Network Traffic Flow

In order for an SDN controller to manage the network and associated policies, it requires a good ‘understanding’ of networking traffic. Centralized or distributed SDN architectures can support a deep understanding of traffic by collecting sets of packets from a traffic flow and analyzing them. There are two different ways to support this requirement.

Option 1—Based on the assumption of having a big pipe/channel between SDN switches and SDN controller, all of the deep packet inspection or application recognition can be done in the central controller with a powerful DPI engine.

Option 2—A small DPI engine can be implemented in the distributed SDN switches. These switches perform a basic deep packet inspection, then report the results or send only streams of important traffic. As we have seen, the latter case requires cheaper and simpler implementation to meet the basic SDN tenet.

Low cost and low power processors can be used for DPI applications. The combination of CPUs and GPUs as found in heterogeneous architectures, the latter being highly optimized for highly parallel programmable applications, provides a significant performance advantage.

I/O Integration

The main processor for SDN requires high speed I/O interfaces, for example, embedded network interfaces such as 1G, 10GE, and PCIe. This can lower system cost and ease system design complexity.

Software

Complicating the development of new SDN solutions is the continuing evolution of standards. Throughout the industry, there are different approaches to enabling network virtualization (for example, VXLAN and NVGRE), and these standards continue to evolve as they move to the next phases. In order to meet the requirements of these evolving standards – and any emerging network overlaying protocols – platforms must be able to provide flexibility and ease of programmability. As an example, the transition from the OpenFlow1.0 spec to the OpenFlow revision 1.3 significantly increased complexity as it aimed to support many types of networking functions and protocols.

Platform Needs

Modern heterogeneous compute platforms contain the following three major function blocks:
General purpose, programmable scalar (CPU) and vector processing cores (GPU)
High-performance bus
Common, low-latency memory model

Leading heterogeneous designs are critical to maximizing throughput. For example, on AMD platforms incorporating Heterogeneous Systems Architecture (HSA), the CPU hands over only the pointers of the data blocks to the GPU. The GPU takes the pointers and processes the data block in the specific memory location and hands them back to the CPU. HSA ensures cache coherency between the CPU and the GPU. Figure 3 depicts an overview of this architecture.

 GPUs are extremely efficient for parallel processing applications, and they can also be used for crypto operations, DPI, classification, compression and other applications. In the case of crypto operations, the CPU doesn’t have to get involved in the data plane crypto operation directly. With this architecture, the system level performance can be maintained even when the amount of traffic needing encryption or decryption increases. In a heterogeneous capable processor, software can selectively accelerate or offload CPU compute-intensive operations to the GPU. Here are a few additional functions that can be accelerated or offloaded to the GPU:

DPI: Implement RegEx engine
Security (such as IPSec) operations: RSA, crypto operation
Compression operation for distributed storage applications


 Figure  4 shows a number of different networking use cases and examples of where different levels of embedded processors integrate into the solution.

Conclusion

SDN introduces a new approach to network resource utilization and management, and each networking vendor in the market is looking for its own way to build SDN solutions. One key action that needs to be taken to enable SDN is to open up the intelligence of switches and routers to enable the abstraction of proprietary vendor technologies.

Mega data center players (Amazon, Google, Facebook and the like) are implementing technologies that will allow them to enable greater flexibility and lower costs. Amazon and Google are building their own networking (white box) switches so that they don’t have to rely on the platforms produced by OEM vendors. Facebook is driving the Open Compute Platform (OCP) to develop specifications for open architecture switches that will be manufactured by low-cost original design manufacturers (ODMs) . The open architecture approach from Facebook is creating an ecosystem where standard, high volume commodity platforms could be used to minimize CAPEX and OPEX costs.

SDN will drive the industry toward a more software-centric architecture and implementation. Thus, in this environment, OEMs find it more difficult to provide platform differentiators. With SDN, the need for less expensive and easy-to-access hardware becomes paramount, and platform-specific, value-added services is deprioritized.

About the Author

Seong Kim is currently a system architect in AMD’s Embedded Networking Division. He has more than 15 years of experience in networking systems architecture and technical marketing. His recent initiatives include NFV, SDN, Server virtualization, wireless communication networking, and security and threat management solutions. Dr. Kim’s work has been published in numerous publications including IEEE communications and Elsevier magazines, and has presented at several industry conferences and webinars. He has several US patents and US patents pending in the field of networking. Kim holds a Ph.D. in Electrical Engineering from State University of New York (SBU) and an M.B.A degree from Lehigh University.



Got an idea for a Blueprint column?  We welcome your ideas on next gen network architecture.
See our guidelines.

Monday, June 23, 2014

NVIDIA and Partners Develop GPU-accelerated ARM64 Servers for HPC

NVIDIA is seeing progress in leverage its GPU accelerators in supercomputers.  Multiple server vendors are now developing 64-bit ARM development systems integrating its NVIDIA GPU processors for high performance computing (HPC).

The new ARM64 servers feature Applied Micro X-GeneARM64 CPUs and NVIDIA Tesla K20 GPU accelerators.  The systems use the hundreds of existing CUDA-accelerated scientific and engineering HPC applications by simply recompiling them to ARM64 systems.

The first GPU-accelerated ARM64 development platforms will be available in July from Cirrascale Corp. and E4 Computer Engineering, with production systems expected to ship later this year. The Eurotech Group also plans to ship production systems later this year. System details include:

  • Cirrascale RM1905D - High-density two-in-one 1U server with two Tesla K20 GPU accelerators; provides high-performance, low total cost of ownership for private cloud, public cloud, HPC, and enterprise applications.
  • E4 EK003 - Production-ready, low-power 3U, dual-motherboard server appliance with two Tesla K20 GPU accelerators, designed for seismic, signal and image processing, video analytics, track analysis, web applications and MapReduce processing. 
  • Eurotech - Ultra-high density, energy efficient and modular Aurora HPC server configuration, based on proprietary Brick Technology and featuring direct hot liquid cooling.

"We aim to leverage the latest technology advances, both within and beyond the HPC market, to move science forward in entirely new ways," said Pat McCormick, senior scientist at Los Alamos National Laboratory. "We are working with NVIDIA to explore how we can unite GPU acceleration with novel technologies like ARM to drive new levels of scientific discovery and innovation."

http://nvidianews.nvidia.com/News/NVIDIA-GPUs-Open-the-Door-to-ARM64-Entry-Into-High-Performance-Computing-b52.aspx

Thursday, May 22, 2014

AMD’s SeaMicro Server Sets OpenStack Record: 168,000 Virtual Machines​​

AMD has achieved record scalability of 168,000 virtual machines on 576 physical hosts, all provisioned on its SeaMicro SM15000 server. AMD said the first 75,000 virtual machines were deployed in six hours and thirty minutes.  The record demonstration was achieved in collaboration with Canonical using the Ubuntu OpenStack (Icehouse) distribution. MaaS (Metal as a Service), part of Ubuntu 14.04 LTS and Ubuntu OpenStack, was used to deliver the bare metal servers, storage and networking.

AMD’s SeaMicro SM15000 is a high-density, 10 rack unit system that links 512 compute cores, 160 gigabits of I/O networking and more than five petabytes of storage with a 1.28 terabyte high-performance supercompute fabric.  The SM15000 server design eliminates top-of-rack switches, terminal servers and hundreds of cables.  It currently supports the next-generation AMD Opteron (Piledriver core) processor, Intel Xeon E3-1260L (Sandy Bridge), E3-1265Lv2 (Ivy Bridge), E3-1265Lv3 (Haswell) and Intel Atom N570 processors.

“This record validates that the SeaMicro SM15000 is well suited for massive OpenStack deployments,” said Dhiraj Mallick, corporate vice president and general manager, AMD Data Center Server Solutions. “The combination of Ubuntu OpenStack and the SeaMicro SM15000 server provides the industry’s leading solution to build cloud infrastructure that is highly responsive and ideal for on-demand services.”

http://www.amd.com/en-us/press-releases/Pages/seamicro-sets-record-2014may13.aspx

Wednesday, January 29, 2014

AMD Readies 4 and 8-core ARM-based Processors

AMD showcased a development platform for its first 64-bit ARM-based server CPU and announced the upcoming sampling of the ARM-based processor, named the AMD Opteron A1100 Series.

The AMD Opteron A1100 Series processors support:

  • 4 or 8 core ARM Cortex-A57 processors
  • Up to 4 MB of shared L2 and 8 MB of shared L3 cache
  • Configurable dual DDR3 or DDR4 memory channels with ECC at up to 1866 MT/second
  • Up to 4 SODIMM, UDIMM or RDIMMs
  • 8 lanes of PCI-Express Gen 3 I/O
  • 8 Serial ATA 3 ports
  • 2 10 Gigabit Ethernet ports
  • ARM TrustZone technology for enhanced security
  • Crypto and data compression co-processors

"The needs of the data center are changing.  A one-size-fits-all approach typically limits efficiency and results in higher-cost solutions,” said Suresh Gopalakrishnan, corporate vice president and general manager of the AMD server business unit.  “The new ARM-based AMD Opteron A-Series processor brings the experience and technology portfolio of an established server processor vendor to the ARM ecosystem and provides the ideal complement to our established AMD Opteron x86 server processors."

http://www.amd.com

Thursday, January 23, 2014

AMD Adds 12- and 16-core "Warsaw" Opteron Server Processors

AMD released new 12- and 16-core AMD Opteron 6300 Series server processors, code named “Warsaw,” designed for virtualized enterprise workloads.

The new AMD Opteron 6300 Series processors feature the "Piledriver" core and are fully socket and software compatible with the existing AMD Opteron 6300 Series. The company said the power efficiency and cost effectiveness of the new processors make them a good fit for the AMD Open 3.0 Open Compute Platform.

“With the continued move to virtualized environments for more efficient server utilization, more and more workloads are limited by memory capacity and I/O bandwidth,” said Suresh Gopalakrishnan, corporate vice president and general manager, Server Business Unit, AMD. “The Opteron 6338P and 6370P processors are server CPUs optimized to deliver improved performance per-watt for virtualized private cloud deployments with less power and at lower cost points.”

http://www.amd.com/us/press-releases/Pages/amd-offers-new-levels-2014jan22.aspx

In May 2013, AMD introduced its Opteron X-Series x86 processors for scale-out server architectures.

The first AMD Opteron X-Series processors, formerly known as “Kyoto,” will come in two variants:
  • The AMD Opteron X2150, which consumes as little as 11 watts, is the first server APU system-on-a-chip integrating CPU and GPU engines with a high-speed bus on a single die. It incorporates AMD Radeon HD 8000 graphics technology for multimedia-oriented server workloads.
  • The AMD Opteron X1150, which consumes as little as 9 watts, is a CPU-only version optimized for general scale-out workloads.

“The data center is at an inflection point and requires a high number of cores in a dense form factor with integrated graphics, massive amounts of DRAM and unprecedented power efficiency to keep up with the pace of innovation of Internet services,” said Andrew Feldman, corporate vice president and general manager, Server Business Unit at AMD. 

Wednesday, May 29, 2013

AMD Debuts Opteron Processors for Scale-out Data Center Servers

AMD introduced its Opteron X-Series x86 processors for scale-out server architectures.

The first AMD Opteron X-Series processors, formerly known as “Kyoto,” will come in two variants:

  • The AMD Opteron X2150, which consumes as little as 11 watts, is the first server APU system-on-a-chip integrating CPU and GPU engines with a high-speed bus on a single die. It incorporates AMD Radeon HD 8000 graphics technology for multimedia-oriented server workloads.
  • The AMD Opteron X1150, which consumes as little as 9 watts, is a CPU-only version optimized for general scale-out workloads.

“The data center is at an inflection point and requires a high number of cores in a dense form factor with integrated graphics, massive amounts of DRAM and unprecedented power efficiency to keep up with the pace of innovation of Internet services,” said Andrew Feldman, corporate vice president and general manager, Server Business Unit at AMD.

“Fundamental changes in computing architectures are required to support space, power and cost demands organizations need to deliver compelling, new infrastructure economics,” said Paul Santeler, vice president and general manager, Hyperscale Server business segment, HP. “The new x86 AMD Opteron X-Series processors integrated into future HP Moonshot servers will continue to push the boundaries of power efficiency for social, mobile, cloud and big data workloads."

http://www.amd.com/us/press-releases/Pages/amd-launches-the-2013may29.aspx

See also