Showing posts with label SuperComputing. Show all posts
Showing posts with label SuperComputing. Show all posts

Thursday, November 8, 2018

SCinet anticipates peak loads of 4 Tbps next week

SCinet, the temporary network serving the SC18 conference in Dallas, anticipates peak load of 4.02 terabits per second as participants in this year's top supercomputing event from academia, government and industry demonstrate their capabilities. Last year’s peak load was a record 3.6 Tbps.

Forty organizations have collaborated to build SCInet at an estimated cost of $51 million. The network has taken one year to plan and one month to build. It will operate for one week and then be torn down in less than 24 hours.

“SCinet can only flourish due to the incredible generosity of our contributing partners,” said Jason Zurawski, SCinet chair and science engagement engineer at the Energy Sciences Network (ESnet). “CenturyLink, Cisco, and Juniper have all gone above and beyond to ensure the success of SCinet this year through the donation of hardware, software, services, and the most important of resources: time.”

https://sc18.supercomputing.org/

Tuesday, November 15, 2016

Intel Cites Gains with its Omni-Path Architecture Systems

Intel cited growing momentum in the nine months since Intel Omni-Path Architecture (Intel OPA) began shipping, The company said OPA is becoming the standard fabric for 100 gigabit (Gb) systems, as it is now featured in 28 of the top 500 most powerfulsupercomputers in the world announced at Supercomputing 2016.  Intel believes OPA now has 66 percent of the 100Gb market. This acceptance is twice the number of InfiniBand EDR systems.

Top500 designs include Oakforest-PACS, MIT Lincoln Lab and CINECA. The Intel OPA systems on the list add up to total floating-point operations per second (FLOPS) of 43.7 petaflops (Rmax), or 2.5 times the FLOPS of all InfiniBand* EDR systems.

Intel OPA is an end-to-end fabric solution that improves HPC workloads for clusters of all sizes, achieving up to 9 percent higher application performance and up to 37 percent lower fabric costs on average compared to InfiniBand EDR.

https://newsroom.intel.com/newsroom/wp-content/uploads/sites/11/2016/11/supercomputing-2016-fact-sheet.pdf

Sunday, September 11, 2016

Juniper Supplies QFX Switches for NCAR Super Computer

The National Center for Atmospheric Research (NCAR) has selected Juniper Networks to provide the networking infrastructure for a new supercomputer that will be used by researchers to predict climate patterns and assess the effects of global warming.

The new supercomputer, which will be installed at the NCAR-Wyoming Supercomputing Center (NWSC), will enable more accurate and actionable projections about the impact of weather and climate change and is expected to perform 5.34 quadrillion calculations per second, making it one of the top performing supercomputers in the world.

The network will use Juniper's QFX10008 Switch, a high-performance, high-density switch that is a foundational element of the Juniper MetaFabric Architecture. NCAR  already uses Juniper's EX Series Ethernet Switches, QFX Series Nodes and MX Series 3D Universal Edge Routers.

http://www.juniper.net

Juniper Debuts Spine Switches for Cloud Scalability

Juniper Networks is introducing a new line of spine switches for bringing physical and logical scale, performance and port density to cloud data centers.

The new QFX10000 line, which builds on the company's MetaFabric architecture, is powered by Juniper’s latest, Q5 custom ASIC. The new chip offers deep buffers and an architecture that supports virtualization for SDN with the logical scale for applications such as big data, video and IP storage.

Some highlights:
  • The new Juniper QFX10002 is a fixed configuration, 2-rack unit form-factor unit designed for from 40GE and 100GE within the same switch. 
  • The QFX10008 is a modular, eight slot chassis that delivers ground breaking 100GE port
  • density and up to 48Tbps of total system capacity.
  • The QFX10016 will deliver unprecedented system capacity of up to 96Tbps, combined with leading port density in a powerful 16-slot chassis. 
  • The QFX10000 line supports orchestration, automation and management solutions from market leaders like VMware, as well as open source solutions, like OpenStack and OpenContrail, will simplify and automate network management and provisioning.
Juniper is also introducing its Junos Fusion automation-enabled software architecture for data centers.  It enables coherency through automation and manages the entire data center as a single network rather than individual network elements. The company said Junos Fusion will make it easy to automate the turn-up and configuration of new networks while reducing the risk of configuration errors. In conjunction with EVPN/VxLAN, it simplifies application placement across multiple layers of the network.

Tuesday, April 5, 2016

NVIDIA Unveils GPU Accelerators for Deep Learning AI

NVIDIA unveiled its most advanced accelerator to date -- the Tesla P100 -- based on Pascal architecture and composed of an array of Graphics Processing Clusters (GPCs), Streaming Multiprocessors (SMs), and memory controllers. The Tesla P100, which is implemented in 16nm FinFET on a massive 610mm2die, enables a new class of servers that can deliver the performance of hundreds of CPU server nodes.

NVIDIA said its accelerator brings five breakthroughs:

  • NVIDIA Pascal architecture for exponential performance leap -- a Pascal-based Tesla P100 solution delivers over a 12x increase in neural network training performance compared with a previous-generation NVIDIA Maxwell-based solution.
  • NVIDIA NVLink for maximum application scalability -- The NVIDIA NVLink high-speed GPU interconnect scales applications across multiple GPUs, delivering a 5x acceleration in bandwidth compared to today's best-in-class solution. Up to eight Tesla P100 GPUs can be interconnected with NVLink to maximize application performance in a single node, and IBM has implemented NVLink on its POWER8 CPUs for fast CPU-to-GPU communication.
  • 16nm FinFET for unprecedented energy efficiency -- with 15.3 billion transistors built on 16 nanometer FinFET fabrication technology, the Pascal GPU is the world's largest FinFET chip ever built.
  • CoWoS with HBM2 for big data workloads -- the Pascal architecture unifies processor and data into a single package to deliver unprecedented compute efficiency. An innovative approach to memory design, Chip on Wafer on Substrate (CoWoS) with HBM2, provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the Maxwell architecture.
  • New AI algorithms for peak performance -- new half-precision instructions deliver more than 21 teraflops of peak performance for deep learning.

At its GPU Technology conference in San Jose, Nvidia also unveiled its DGX-1 Deep Learning supercomputer. It is a turnkey system that integrates eight Tesla P100 GPU accelerators, delivering the equivalent throughput of 250 x86 servers.

"Artificial intelligence is the most far-reaching technological advancement in our lifetime," said Jen-Hsun Huang, CEO and co-founder of NVIDIA. "It changes every industry, every company, everything. It will open up markets to benefit everyone. Data scientists and AI researchers today spend far too much time on home-brewed high performance computing solutions. The DGX-1 is easy to deploy and was created for one purpose: to unlock the powers of superhuman capabilities and apply them to problems that were once unsolvable."

"NVIDIA GPU is accelerating progress in AI. As neural nets become larger and larger, we not only need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication, as well as hardware that can take advantage of reduced-precision arithmetic. This is precisely what Pascal delivers," said Yann LeCun, director of AI Research at Facebook.

http://nvidianews.nvidia.com

Wednesday, February 17, 2016

Mellanox Supplies100G InfiniBand for European Supercomputer Center

The Flemish Supercomputer Center (VSC) in Belgium has selected Mellanox’s end-to-end 100Gb/s EDR interconnect solutions to be integrated into a new LX-series supercomputer from NEC. The system will be the fastest supercomputer (peak performance of 623 Teraflops) and the first complete end-to-end EDR 100Gb/s InfiniBand system in Belgium.

“Mellanox is thrilled to contribute to a project that will accelerate scientific discovery and has the capability to solve some of the top problems plaguing scientists and researchers today,” said Gilad Shainer, vice president of marketing, Mellanox Technologies. “This new supercomputer will position the university to make discoveries critical for the continuous advancement of science. VSC selected Mellanox for the performance levels only EDR 100Gb/s InfiniBand can deliver and is a prime example of the growing global demand for intelligent, fast, highly reliable, and cost efficient interconnect technology.”

https://www.vscentrum.be
http://www.mellanox.com

Wednesday, July 29, 2015

U.S. Launches National Strategic Computing Initiative

President Obama issued an Executive Order establishing the National Strategic Computing Initiative (NSCI), which aims to bolster the development and deployment of high-performance computing (HPS) systems.

The initiative represents a "coordinated research, development, and deployment strategy will draw on the strengths of departments and agencies to move the Federal government into a position that sharpens, develops, and streamlines a wide range of new 21st century applications. It is designed to advance core technologies to solve difficult computational problems and foster increased use of the new capabilities in the public and private sectors.

The National Strategic Computing Initiative has five strategic themes.

  • Create systems that can apply exaflops of computing power to exabytes of data.
  • Keep the United States at the forefront of HPC capabilities.
  • Improve HPC application developer productivity.
  • Make HPC readily available.
  • Establish hardware technology for future HPC systems. 

https://www.whitehouse.gov/sites/default/files/microsites/ostp/nsci_fact_sheet.pdf

Monday, July 13, 2015

Intel Shows its Omni-Path Architecture for HPC

Intel conducted the first public "powered-on" demonstration of its Omni-Path Architecture, a next-generation fabric technology for high performance computing (HPC) clusters.

The demonstration, conducted at the ISC2015 show in Frankfurt, featured Intel Omni-Path Architecture (Intel OPA), an end-to-end solution, including PCIe* adapters, silicon, switches, cables, and management software, that builds on the existing Intel True Scale Fabric and Infiniband. Intel OPA was designed to address the challenge that processor capacity and memory bandwidth have been scaling faster than system I/O.  It accelerates the message passing interface (MPI) rates in next gen systems. Intel OPA also promises the ability to scale to tens — and eventually hundreds — of thousands of nodes.

Intel Omni-Path Architecture uses technologies acquired from both QLogic and Cray, as well as Intel-developed technologies. In the near future, Intel says it will integrate the Intel Omni-Path Host Fabric Interface onto future generations of Intel Xeon processors and Intel Xeon Phi processors.

Intel also announced new a collaboration with HP to develop purpose-built HP Apollo systems designed to expand the use of HPC solutions to enterprises of all sizes.  The purpose built HP Apollo compute platforms will utilize the Intel HPC scalable system framework, including next generation Intel Xeon processors, the Intel Xeon Phi product family, Intel Omni-Path Architecture and the Intel Enterprise Edition of Lustre software.

http://www.intel.com/content/www/us/en/high-performance-computing-fabrics/omni-path-architecture-fabric-overview.html


In April 2015, Intel and Cray were selected to build two next generation, high-performance computing (HPC) systems that will be five to seven times more powerful than the fastest supercomputers today.

Intel will serve as prime contractor to deliver the supercomputers for the U.S. Department of Energy’s (DOE) Argonne Leadership Computing Facility (ALCF). The Aurora system will be based on Intel’s HPC scalable system framework and will be a next-generation Cray “Shasta” supercomputer. Intel said the Aurora system will be delivered in 2018 and have a peak performance of 180 petaflops, making it the world’s most powerful system currently announced to date. Aurora will use future generations of Intel Xeon Phi processors and the Intel Omni-Path Fabric high-speed interconnect technology, a new non-volatile memory architecture and advanced file system storage using Intel Lustre software.



In November 2014, Intel confirmed that its third-generation Intel Xeon Phi product family, code-named Knights Hill, will be built using 10nm process technology and that it will integrate Intel Omni-Path Fabric technology. Knights Hill will follow the upcoming Knights Landing product, with first commercial systems based on Knights Landing expected to begin shipping next year.

Intel also disclosed that its Intel Omni-Path Architecture will achieve 100 Gbps line speed and up to 56 percent lower switch fabric latency in medium-to-large clusters than InfiniBand alternatives. The architecture targets a 48 port switch chip compared to the current 36 port InfiniBand alternatives. This will reduce the number of switches required in HPC clusters.

Thursday, April 9, 2015

Intel and Cray to Build Next Gen Supercomputers for DoE

Intel and Cray have been selected to build two next generation, high-performance computing (HPC) systems that will be five to seven times more powerful than the fastest supercomputers today.

Intel will serve as prime contractor to deliver the supercomputers for the U.S. Department of Energy’s (DOE) Argonne Leadership Computing Facility (ALCF). The Aurora system will be based on Intel’s HPC scalable system framework and will be a next-generation Cray “Shasta” supercomputer. Intel said the Aurora system will be delivered in 2018 and have a peak performance of 180 petaflops, making it the world’s most powerful system currently announced to date. Aurora will use future generations of Intel Xeon Phi processors and the Intel Omni-Path Fabric high-speed interconnect technology, a new non-volatile memory architecture and advanced file system storage using Intel Lustre software.

A second system, to be named Theta, will serve as an early production system for the ALCF. To be delivered in the 2016, the system will provide performance of 8.5 petaflops while requiring only 1.7 megawatts of power. The Theta system will be powered by Intel Xeon processors and next-generation Intel Xeon Phi processors, code-named Knights Landing, and will be based on the next-generation Cray XC supercomputer.

“The Aurora system will be one of the most advanced supercomputers ever built, and Cray is honored and proud to be collaborating with two great partners in Intel and Argonne National Lab,” said Peter Ungaro, president and CEO of Cray. “The combination of Cray’s vast experience in building some of the world’s largest and most productive supercomputers, combined with Intel’s cutting-edge technologies will provide the ALCF with a leadership-class system that will be ready for advancing scientific discovery from day one.”

http://newsroom.intel.com/community/intel_newsroom/blog/2015/04/09/chip-shot-intel-selected-by-us-department-of-energy-to-deliver-nations-most-powerful-supercomputer

In November 2014, Intel announced that its third-generation Intel Xeon Phi product family, code-named Knights Hill, will be built using Intel's 10nm process technology and integrate Intel Omni-Path Fabric technology. Knights Hill will follow the upcoming Knights Landing product, with first commercial systems based on Knights Landing expected to begin shipping next year.

Intel also disclosed that its Intel Omni-Path Architecture will achieve 100 Gbps line speed and up to 56 percent lower switch fabric latency in medium-to-large clusters than InfiniBand alternatives. The architecture targets a 48 port switch chip compared to the current 36 port InfiniBand alternatives. This will reduce the number of switches required in HPC clusters.

Monday, November 17, 2014

IBM Lands $325M Contracts for Supercomputers in National Labs

The U.S. Department of Energy has awarded IBM contracts valued at $325 million to develop and deliver the world’s most advanced “data centric” supercomputing systems at Lawrence Livermore and Oak Ridge National Laboratories.

IBM said its new systems will employ a “data centric” approach that puts computing power everywhere data resides, minimizing data in motion and energy consumption. These OpenPOWER-based systems are expected to offer five to 10 times better performance on commercial and high-performance computing applications compared to the current systems at the labs, while being more than five times more energy efficient.

The “Sierra” supercomputer at Lawrence Livermore and “Summit” supercomputer at Oak Ridge will each have a peak performance well in excess of 100 petaflops balanced with more than five petabytes of dynamic and flash memory to help accelerate the performance of data centric applications. IBM's design will be capable of moving data to the processor, when necessary, at more than 17 petabytes per second (which is equivalent to moving over 100 billion photos on Facebook in a second).

http://www.ibm.com


Monday, June 23, 2014

NVIDIA and Partners Develop GPU-accelerated ARM64 Servers for HPC

NVIDIA is seeing progress in leverage its GPU accelerators in supercomputers.  Multiple server vendors are now developing 64-bit ARM development systems integrating its NVIDIA GPU processors for high performance computing (HPC).

The new ARM64 servers feature Applied Micro X-GeneARM64 CPUs and NVIDIA Tesla K20 GPU accelerators.  The systems use the hundreds of existing CUDA-accelerated scientific and engineering HPC applications by simply recompiling them to ARM64 systems.

The first GPU-accelerated ARM64 development platforms will be available in July from Cirrascale Corp. and E4 Computer Engineering, with production systems expected to ship later this year. The Eurotech Group also plans to ship production systems later this year. System details include:

  • Cirrascale RM1905D - High-density two-in-one 1U server with two Tesla K20 GPU accelerators; provides high-performance, low total cost of ownership for private cloud, public cloud, HPC, and enterprise applications.
  • E4 EK003 - Production-ready, low-power 3U, dual-motherboard server appliance with two Tesla K20 GPU accelerators, designed for seismic, signal and image processing, video analytics, track analysis, web applications and MapReduce processing. 
  • Eurotech - Ultra-high density, energy efficient and modular Aurora HPC server configuration, based on proprietary Brick Technology and featuring direct hot liquid cooling.

"We aim to leverage the latest technology advances, both within and beyond the HPC market, to move science forward in entirely new ways," said Pat McCormick, senior scientist at Los Alamos National Laboratory. "We are working with NVIDIA to explore how we can unite GPU acceleration with novel technologies like ARM to drive new levels of scientific discovery and innovation."

http://nvidianews.nvidia.com/News/NVIDIA-GPUs-Open-the-Door-to-ARM64-Entry-Into-High-Performance-Computing-b52.aspx

Top 500 Supercomputer List for June 2014

A new list of the Top 500 Supercomputer Sites has just been published and for the third consecutive year the most powerful designation goes to the Tianhe-2 supercomputer at China's University of Defense Technology with a performance of 33.86 petaflops delivered by its 3.1 million Intel Xeon cores.

A few highlights from the list:


  • Total combined performance of all 500 systems has grown to 274 Pflop/s, compared to 250 Pflop/s six months ago and 223 Pflop/s one year ago. This increase in installed performance also exhibits a noticeable slowdown in growth compared to the previous long-term trend.
  • There are 37 systems with performance greater than a Pflop/s on the list, up from 31 six months ago.
  • The No. 1 system, Tianhe-2, and the No. 7 system, Stampede, use Intel Xeon Phi processors to speed up their computational rate. The No. 2 system, Titan, and the No. 6 system, Piz Daint, use NVIDIA GPUs to accelerate computation.
  • A total of 62 systems on the list are using accelerator/co-processor technology, up from 53 from November 2013. Forty-four of these use NVIDIA chips, two use ATI Radeon, and there are now 17 systems with Intel MIC technology (Xeon Phi). The average number of accelerator cores for these 62 systems is 78,127 cores/system.


The full listed is posted here:

http://www.top500.org/lists/2014/06/

Tuesday, November 19, 2013

Intel Tunes its Xeon Phi for Programming Simplicity and Performance

Intel confirmed that its next generation Xeon Phi devices (codenamed "Knights Landing"), available as a host processor, will fit into standard rack architectures and run applications entirely natively instead of requiring data to be offloaded to the coprocessor.

Intel said this approach will significantly reduce programming complexity and eliminate “offloading” of the data, thus improving performance and decreasing latencies caused by memory, PCIe and networking. Knights Landing will also offer developers three memory options to optimize performance.

In addition, Intel and Fujitsu recently announced an initiative that could potentially replace a computer’s electrical wiring with fiber optic links to carry Ethernet or PCI Express traffic over an Intel Silicon Photonics link. This enables Intel Xeon Phi coprocessors to be installed in an expansion box, separated from host Intel Xeon processors, but function as if they were still located on the motherboard. This allows for much higher density of installed coprocessors and scaling the computer capacity without affecting host server operations.

http://www.intel.com

  • In June, Intel announced five new Xeon Phi coprocessors:  the Intel Xeon Phi coprocessor 7100 family is designed and optimized to provide the best performance and offer the highest level of features, including 61 cores clocked at 1.23GHz, 16 GB of memory capacity support (double the amount previously available in accelerators or coprocessors) and over 1.2 TFlops of double precision performance. The Intel Xeon Phi coprocessor 3100 family is designed for high performance per dollar value. The family features 57 cores clocked at 1.1 GHz and 1TFlops of double precision performance. The Intel Xeon Phi coprocessor 5100 family is optimized for high-density environments with the ability to allow sockets to attach directly to a mini-board for use in blade form factors.
    Looking further ahead, the second generation Intel Xeon Phi coprocessors, codenamed "Knights Landing," will be manufactured using Intel's 14nm process technology featuring second generation 3-D tri-gate transistors.  It will be available either on a PCIe card or a host processor (CPU). As a PCIe card-based coprocessor, "Knights Landing" will handle offload workloads from the system's Intel Xeon processors and provide an upgrade path for users of current generation of coprocessors.  As a host processor directly installed in the motherboard socket, it will function as a CPU and enable the next leap in compute density and performance per watt, handling all the duties of the primary processor and the specialized coprocessor at the same time.

Sunday, November 17, 2013

Department of Energy Funds Research in Next-Gen SuperComputer Interconnects

The Department of Energy’s (DOE) Office of Science and the National Nuclear Security Administration (NNSA) have awarded $25.4 million in research and development contracts to five leading companies in high-performance computing (HPC) to accelerate the development of next-generation supercomputers.

Under DOE’s new DesignForward initiative, AMD, Cray, IBM, Intel Federal and NVIDIA will work to advance extreme-scale, on the path to exascale, computing technology. The contracts, which cover a two-year performance period, will support the design and evaluation of interconnect architectures for future advanced HPC architectures. Such interconnects will tie together hundreds of thousands or millions of processors, as building blocks of supercomputers to be used in studying complex problems in unprecedented detail.  Intel will focus on interconnect architectures and implementation approaches, Cray on open network protocol standards, AMD on interconnect architectures and associated execution models, IBM on energy-efficient interconnect architectures and messaging models and NVIDIA on interconnect architectures for massively threaded processors.

“Exascale computing is key to NNSA’s capability of ensuring the safety and security of our nuclear stockpile without returning to underground testing,” said Robert Meisner, director of the NNSA Office of Advanced Simulation and Computing program. “The resulting simulation capabilities will also serve as valuable tools to address nonproliferation and counterterrorism issues, as well as informing other national security decisions.”

“In an era of fierce international HPC competition, the development of exascale computing becomes critical not only to our national security missions but to the nation’s economic competitiveness in the global marketplace,” said William Harrod, FastForward Program Manager and Research Division Directorfor DOE’s Advanced Scientific Computing Research program. “This partnership between industry, the DOE Office of Science and NNSA supports the development of technology to overcome the obstacles on the road to exascale systems.”

http://www.nersc.gov/news-publications/news/nersc-center-news/2013/department-of-energy-awards-25-4-million-in-contracts-for-extreme-scale-supercomputer-interconnect-design/

See also