Showing posts with label Facebook. Show all posts
Showing posts with label Facebook. Show all posts

Sunday, July 15, 2018

Facebook to open source eXecutable ARchives (XARs)

Facebook has decided to pursue an open source path for its eXecutable ARchives (XARs), which is a way to create self-contained executables.

XARs, which are already being used at scale  at Facebook, are described as "a system for distributing self-contained executables that encapsulate both data and code dependencies."

Facebook says XARs are the fastest way to distribute and execute large Python applications.

https://code.fb.com/data-infrastructure/xars-a-more-efficient-open-source-system-for-self-contained-executables/

Monday, July 9, 2018

Facebook presentation: Optics Inside the Data Center

Mark McKillop, Network Engineer at Facebook, and Katharine Schmidtke, Sourcing Manager of Network Hardware at Facebook, talk about challenges in Facebook's optical networks, both in backbone and in data centers.

The first part of the video covers the optical systems used to connect Facebook's POPs and data centers.

The second part discusses optical scaling challenges inside the data centers, including the potential for onboard optics in future systems.

This 30-minute video presentation was recorded at Facebook's Networking@Scale 2018 event in June in California.

See video:
https://www.facebook.com/atscaleevents/videos/2090069407932819/

Facebook's announced data centers:

U.S
Prineville, Oregon
Los Lunas, New Mexico
Papillon, Nebraska
Fort Worth, Texas
Altoona, Iowa
New Albany, Ohio
Henrico County, Virginia
Forest City, North Carolina
Newton County, Georgia

Europe
Clonnee, Ireland
Lulea, Sweden
Odense, Denmark



Facebook looks to Spiral for self-tuning, real-time services

Facebook's engineering group is developing a small, embedded C++ library called Spiral to providing the self-tuning of its data infrastructure necessary to fine-tune thousands of micro services.
Machine learning is used with Spiral to create data-driven and reactive heuristics for resource-constrained real-time services.

Facebook says software maintenance increasingly requires self-tuning capabilities because it is simply "too difficult to rewrite caching/admission/eviction policies and other manually tuned heuristics by hand."

https://code.fb.com/data-infrastructure/spiral-self-tuning-services-via-real-time-machine-learning/


Sunday, July 8, 2018

NEC selected for Bay to Bay Express subsea cable system

NEC has been selected to build a high-performance submarine cable connecting Singapore, Hong Kong and the United States.

A consortium composed of China Mobile International, Facebook and Amazon Web Services is backing the Bay to Bay Express Cable System (BtoBE).

Construction of the nearly 16,000-kilometer optical submarine cable is expected to be completed by the fourth quarter of 2020.

NEC said the BtoBE system will utilize multiple pairs of optical fiber and achieve round trip latency of less than 130 milliseconds.

"NEC is honored to be selected by the BtoBE consortium as the turn-key system supplier for this world record-breaking optical fiber submarine cable system that covers the longest distance without regeneration. The BtoBE, landing at three locations spanning across the Pacific Ocean, is designed so that once completed, it can carry at least 18Tbs of capacity per fiber pair," said Mr. Toru Kawauchi, General Manager of the Submarine Network Division at NEC Corporation. "The BtoBE will provide seamless connectivity and network diversity, while serving to complement other Asia-Pacific submarine cables, among others."

https://www.nec.com/en/press/201807/global_20180709_03.html



Wednesday, June 20, 2018

Instagram crosses 1 billion user milestone, launches video platform

Instagram now has over 1 billion users, a major milestone for the service, which was launched in 2010.

Instagram also introduced IGTV, a new app for long-form, vertical video from Instagram creators. Videos can be up to one hour in length and will be sorted by channels. The service will be rolled out gradually over the next few weeks.

https://www.facebook.com/InstagramEnglish/videos/2021766097857435/


Tuesday, June 19, 2018

Facebook open sources binary optimization and layout tool

Facebook has decided to open source a binary optimization and layout tool (BOLT) that optimizes the placement of instructions in memory, thereby reducing CPU execution time by 2 percent to 15 percent.

BOLT rearranges code inside functions based on their execution profile.

Facebook said the tool was proven to be very efficient with its highly complex services which are driven by very large source code bases.

https://code.facebook.com/posts/605721433136474/accelerate-large-scale-applications-with-bolt/


Thursday, June 14, 2018

Facebook to build its next hyperscale data center in Alabama

Facebook has chosen Huntsville, Alabama for its next hyperscale data center location.

Facebook estimates it will invest $750 million in the 970,000 square foot facility.

As with its other data centers, Facebook committed to 100% renewable energy and is looking at new solar projects in the area. The company says it is working with the Tennessee Valley Authority to establish a renewable energy tariff that will let other qualifying customers buy new renewable resources as well.

The Huntsville Data Center could be operational in 2020.

Facebook plans next data center in Utah

Facebook will build one of its hyperscale data centers in Eagle Mountain, Utah.

The 970,000 square foot Eagle Mountain Data Center will be powered by 100% renewable energy.

Facebook said the Eagle Moutain project represents an investment of more than $750 million.

The data center will use outside air to cool its servers.

Wednesday, June 6, 2018

Facebook pioneers StatePoint Liquid Cooling for data centers

Facebook is pioneering a StatePoint Liquid Cooling (SPLC) system, developed in partnership with Nortek Air Solutions, that promises to increase the power efficiency of its data centers.

The latest Facebook data centers in certain dry climates currently use a direct evaporative cooling system based on outdoor air rather than water.

Facebook estimates that the new SPLC technique can reduce water usage by more than 20 percent for data centers in hot and humid climates and by almost 90 percent in cooler climates.

The SPLC system is described as an advanced evaporative cooling technology, patented by Nortek, that uses a liquid-to-air energy exchanger, in which water is cooled as it evaporates through a membrane separation layer.

Further details are provided on Facebook's engineering blog.

https://code.facebook.com/posts/1221779261291831/statepoint-liquid-cooling-system-a-new-more-efficient-way-to-cool-a-data-center/

Wednesday, May 30, 2018

Facebook plans next data center in Utah

Facebook will build one of its hyperscale data centers in Eagle Mountain, Utah.

The 970,000 square foot Eagle Mountain Data Center will be powered by 100% renewable energy.

Facebook said the Eagle Moutain project represents an investment of more than $750 million.

The data center will use outside air to cool its servers.

Wednesday, May 23, 2018

Facebook open sources Katran, its network load balancer

Facebook is releasing its Katran forwarding plane software library, which powers the network load balancer used in Facebook's infrastructure.

Facebook said that by sharing Katran with the open source community, it hopes others can improve the performance of their load balancers and use it as a foundation for future work.

Katran is deployed today on backend servers in Facebook’s points of presence (PoPs). Katran offers a software-based solution to load balancing with a completely reengineered forwarding plane that takes advantage of two recent innovations in kernel engineering: eXpress Data Path (XDP) and the eBPF virtual machine.


https://code.facebook.com/posts/1906146702752923/open-sourcing-katran-a-scalable-network-load-balancer/

On GitHub:  https://github.com/facebookincubator/katran

Thursday, April 26, 2018

Facebook's reach continues to grow

Despite the well-publicized uproar about data privacy, Facebook continues to add users worldwide at a rapid pace.

  • Daily active users (DAUs) – DAUs were 1.45 billion on average for March 2018, an increase of 13% year-over-year.
  • Monthly active users (MAUs) – MAUs were 2.20 billion as of March 31, 2018, an increase of 13% year-over-year.
  • Mobile advertising revenue – Mobile advertising revenue represented approximately 91% of advertising revenue for the first quarter of 2018, up from approximately 85% of advertising revenue in the first quarter of 2017.
  • Capital expenditures – Capital expenditures for the first quarter of 2018 were $2.81 billion.

"Despite facing important challenges, our community and business are off to a strong start in 2018," said Mark Zuckerberg, Facebook founder and CEO. "We are taking a broader view of our responsibility and investing to make sure our services are used for good. But we also need to keep building new tools to help people connect, strengthen our communities, and bring the world closer together."

Wednesday, April 4, 2018

Facebook: Likely that most user public profile data has been scraped

Most people on Facebook could have had their public profile scraped by malicious actors or bots, according to a new blog posting by Mike Schroepfer, Facebook's Chief Technology Officer.  Data scraping occurs when a bot submits a person's email or phone number to a Facebook API to gain access to other public data on their account.

Facebook also confirmed that it analyzes call and text history for people using Messenger or Facebook lite but that it does not collect the content of messages. The tracking is described as an opt-in capability to help users build more useful contact lists.

https://newsroom.fb.com/news/2018/04/restricting-data-access/

Wednesday, December 13, 2017

Facebook plans big expansion of Oregon data center

Facebook announced plans to add 900,000 square feet to its data center campus in Prineville, Oregon.

The expansion comes in the form of two new buildings expected to enter service in 2020 and 2021.

The Prineville site is where Facebook opened its first owned data center eight years ago. Since then, the site has expanded to three buildings encompassing 1.25 million square feet of space.

Wednesday, November 15, 2017

Facebook to triple the size of its Los Lunas data center

Roughly one year after it announced plans to build one of its hyperscale data centers in Los Lunas, New Mexico, Facebook announced plans to triple down on its project.

Currently, phase one of Facebook's data center is under construction. This includes two buildings and an admin space, totaling 970,000 square feet.

Facebook now plans to build an additional four buildings on the site, creating a six-building data center campus.The four additional buildings will add nearly 2 million square feet.  Facebook now estimates its investment in the Los Lunas Data Center at more than $1 billion into the site. Construction of these phases will continue through 2023.

Los Lunas is located about 40 km (25 miles) south of Albuquerque at an elevation of 1,480 meters (4,856 ft).

Wednesday, August 16, 2017

Facebook picks Ohio for its latest data center

Facebook has selected New Albany, Ohio as the location for its 10th major data center. New Albany is a town of about 8,500 people located in the geographic center of Ohio, about 20 miles to the northeast of Columbus, and at an elevation of 1,000 feet.

Like Facebook's other recent data center projects, this new facility will be powered 100% by renewable energy and it will used Open Compute Project architecture and principles, including direct evaporative cooling by outdoor air.

The New Albany data center will be 900,000 square feet in size and located on a 22 acre parcel. Media reports stated that Facebook plans to invest $750 million in the project. The New Albany data center is expected to go online in 2019.

https://www.facebook.com/NewAlbanyDataCenter/?fref=mentions

Locations of Facebook data centers:

  • Prineville, Oregon
  • Forest City, North Carolina
  • Lule√•, Sweden
  • Altoona, Iowa
  • Fort Worth, Texas
  • Clonee, Ireland
  • Los Lunas, (New Mexico
  • Odense, Denmark (announced Jan 2017)


Thursday, May 11, 2017

Facebook dreams of better network connectivity platforms – Part 2

Preamble

Facebook has a stated goal of reducing the cost of network connectivity by an order of magnitude. To achieve this its labs are playing with millimetre wave wireless, free-space optics and drones in the stratosphere.

Project Aquila takes flight

At this year's F8 conference, Facebook gave an update on the Aquila drone aircraft, which is being assembled in California's Mojave Desert. The Aquila project is cool - pretty much everything about this initiative, from its name to its sleek design, has an aura about it that says 'this is cool', who wouldn't want to be developing a solar-powered drone with the wingspan of a Boeing 737. Using millimetre wave technology onboard Aquila, Facebook has achieved data transmission speeds of up to 36 Gbit/s over a 13 km distance; and using free-space optical links from the aircraft has achieved speeds of 80 Gbit/s over 13 km.

Several media sources reported a technical set-back last year (rumours of a cracked frame) but those are in the past or perhaps not relevant any more. At F8, Facebook said Aquila has progressed and is now ready for field testing. However, here again, one element that seems to be missing is the business case. Just where is this aircraft going to fly and who will pay for it?

As described by Facebook, Aquila will serve regions of the planet with poor or no Internet access. Apparently, this would not include the oceans and seas, nor the polar regions, where such an aircraft might have to hover for months or years before serving even one customer. Satellites already cover much of our planet’s surface and for extremely remote locations this is likely to remain the only option for Internet access. New generations of satellites, include medium earth orbit (MEO) constellations, are coming with improved latency and throughput. So Facebook's Aquila must aim to be better than the satellites.

The aircraft is designed to soar and circle at altitudes up 90,000 feet during the day, slowly descending to 60,000 by the end of the night. The backhaul presumably will be a free-space laser to a ground station below. At such a height, Aquila would be above the weather and above the jet stream. During the day, with an unobscured view of the sun, it would recharge the batteries needed to keep flying at night.

Apart from satellites, the alternative architecture for serving such regions would be conventional base stations mounted on tall masts and connected via fibre, microwave or satellite links. Many vendors are already offering solar-powered versions of these base stations, and there are plenty of case studies of how they have been used successfully in part of Africa, and the advantages over a high-flying drone are obvious: mature technology, fixed towers and known costs, no possibility of dangerous or embarrassing crashes.

One could imagine that the Facebook approach might bring new Internet access possibilities to areas such as the Sahara, the Atacama, over islands in the Indonesian archipelago. But is not clear if Aquila’s onboard radios would be powerful enough to penetrate dense forests, such as in the Amazon or Congo. So, if the best deployment scenario is a desert or island with some humans but insufficient Internet access, why is satellite service not a viable option? The likely answer again is economics. Perhaps the populations living in these regions simply have not had the money to purchase enough smartphones or laptops to make it worthwhile for a carrier to bring service.

A further consideration worth noting is that it may be difficult for an American company to secure permission to have a fleet of drone aircraft circling permanently over a sovereign nation. Intuitively, many people would not feel comfortable with a U.S. drone circling overhead, even if it were delivering faster social media.

Designing a communications platform for emergency deployments

Facebook's connectivity lab is also interested in disaster preparedness. At the F8 keynote, it unveiled Tether-tenna, a helicopter-drone that carries a base station and is connected via fibre and a cable with high-voltage power to a mooring station. The system is designed for rapid deployment after a natural disaster and could provide mobile communications over a wide area. But is it a complex technology that provides minimal benefits (certainly not an order of magnitude) over existing solutions?

The closest equivalent in the real world is the cellular-on-wheels (COWs) segment, which is now commonly used by most mobile operators for extending or amplifying their coverage during special events such as outdoor concerts and football matches. A typical COW is really just a base station mounted on a truck or trailer. After being hauled to the desired location, the units can be put into operation in a matter of minutes, using on-board batteries, diesel generators or attachment to the electrical grid. The units often have telescoping masts that extend 4-5 metres in height.

In comparison to a COW, Facebook's Tether-tenna heli-drone will have a height advantage, perhaps 100 metres over the competitors, enabling it to extend coverage over a greater range. However, the downsides are quite apparent too. Base station weight restrictions on the heli-drone, which also must carry the weight of the tether, will be more limiting than on a mast, and this means that the Tether-tenna will not provide the density of coverage possible via a COW, thereby limiting its potential use cases.

In addition, a crashing heli-drone could do a lot of damage to people or property on the ground, and wind would be a major factor, as would lightning strikes. There is also the possibility of collisions with other drones, airplanes or birds. Therefore ensuring safety might require a human operator to be present when the drone is flying, and insurance costs inevitably would be higher than any of the many varieties of COWs that are already in service.

AT&T has a more elegant name for this gear, preferring to call them Cells on Light Trucks (COLTs). During the recent Coachella 2017 music festival in California, AT&T deployed four COLTs equipped with high-capacity drum set antennae, which offer 30x the capacity of a traditional, single-beam antenna. AT&T reported that the COLTs were instrumental in handling the 40 Tbit/s of data that traversed its network during the multi-day event - the equivalent of 113 million selfies. Data traffic from Coachella was up 37% over last year, according to AT&T. Would a heli-drone make sense for a week-long event such as this?  Probably not, but it's still a cool concept.

All of this raises the question: is a potential business case even considered before a development project gets funded at Facebook?

In conclusion, Facebook is a young company with a big ambition to connect the unconnected. Company execs talk about a ten-year plan to advance its technologies, so they have the time and money to play with multiple approaches that could make a difference. A business case for these three projects may not be apparent now but they could evolve into something serendipitously.

Wednesday, May 10, 2017

Facebook dreams of better network connectivity platforms – Part 1


Facebook's decision to launch the Open Compute Project (OCP) six years ago was a good one. At the time, Facebook was in the process of opening its first data centre, having previously leased space in various third party colocation facilities. As it constructed this first facility in Prineville, Oregon the company realised that it was going to have to build faster, cheaper and smarter if this strategy were to succeed, and that to keep up with its phenomenal growth it would have to open massive data centres in multiple locations.

In 2016, Facebook kicked off the Telecom Infra Project (TIP) with a mission to take the principles of the Open Compute Project (OCP) model and apply them to software systems and components involved in access, backhaul and core networks. The first TIP design ideas look solid and have quickly gained industry support. Among these is Voyager, a 'white box' transponder and routing platform based on Open Packet DWDM. This open line system will include Yang software data models of each component in the system and an open northbound software interface (such as NETCONF or Thrift) to the control plane software, essentially allowing multiple applications to run on top of the open software layer. The DWDM transponder hardware includes DSP ASICs and complex optoelectronic components, and thus accounts for much of the cost of the system.

The hardware design leverages technologies implemented in Wedge 100, Facebook's top-of-rack switch, including the same Broadcom Tomahawk switching ASIC. It also uses the DSP ASIC and optics module (AC400) from Acacia Communications for the DWDM line side with their open development environment. Several carriers and data centre operators have already begun testing Voyager platforms from multiple vendors.

In November 2016, Facebook outline its next TIP plans including Open Packet DWDM for metro and long-haul optical transport networks. This idea is intended to enable a clean separation of software and hardware based on open specifications. Again, there is early support for a platform with real world possibilities, either within Facebook's global infrastructure or as an open source specification that is ultimately adopted by others.

What's cooking at Facebook's network connectivity labs

At its recent F8 Developer’s conference in San Jose, Facebook highlighted several other telecom-related R&D projects out of its network connectivity lab that seem to be more whimsical fancy than down-to-earth practicality. In the big picture, these applied research projects could be game-changers in the race to the billions of people worldwide currently without Internet access, or potential Facebook users of the future. Facebook said its goal here is to bring down the cost of connectivity by an 'order of magnitude', a pretty high bar considering the pace of improvement already seen in mobile networking technologies.

This article will focus on three projects mentioned at this year's F8 keynote, namely: Terragraph, a 60 GHz multi-node wireless system for dense urban areas that uses radios based on the WiGig standard; Aquila, a solar-powered drone for Internet delivery from the stratosphere; and Tether-tenna, a sort of helicopter drone with a base station. It is not clear if these three projects will eventually become part of the TIP of even if they will progress beyond lab trials.

Terragraph

Terragraph is Facebook's multi-node wireless system for delivering high-speed Internet connectivity to dense urban areas and capable of delivering gigabit speed to mobile handsets. The scheme, first announced at last year's F8 conference, calls for IPv6-only Terragraph nodes to be placed at 200-metre intervals. Terragraph will incorporate commercial off-the-shelf components and aim for high-volume, low-cost production. Facebook noted that up to 7 GHz of bandwidth is available in the unlicensed 60 GHz band in many countries, while U.S. regulators are considering expanding this to a total of 14 GHz. Terragraph will also leverage an SDN-like cloud compute controller and a new modular routing protocol that Facebook has optimised for fast route convergence and failure detection. The architecture also tweaks the MAC layer to solve shortcomings of TCP/IP over a wireless link. The company says the TDMA-TDD MAC layers delivers up to 6x improvement in network efficiency while being more predictable than the existing WiFi/WiGig standard.

At the 2017 F8 conference, Facebook talked about how Terragraph is being tested in downtown San Jose, California, a convenient location given that is right next door for Facebook. Weather will not be a significant factor since San Jose does not experience the rolling summer fog of nearby San Francisco, nor does it suffer torrential tropical downpours, whiteout blizzard conditions, scorching summer heat, or Beijing-style air pollution that could obscure line-of-sight.

While the trial location might be ideal, one should also consider in which cities would Terragraph be practical. First, there are plenty of WiFi hotspots throughout San Jose and smartphone penetration is pretty much universal and nearly everyone has 4G service. Heavy data users have the option on unlimited plans from the major carriers. So maybe San Jose only serves as the technical trial and the business case is more applicable to Mexico City or Manaus, Lagos, Nairobi, or other such dense urban areas.

At the F8 conference, Facebook showed an AI system being used to optimise small cell placement from a 3D map of the city centre. The 3D map included data for the heights of buildings, trees and other obstacles. The company said this AI system alone could be a game changer simply by eliminating the many hours of human engineering that would be needed to scope out good locations for small cells. However, the real world is more complicated. Just because the software identifies a particular light pole as an ideal femtocell placement does not mean that the city will approve it. There are also factors such as neighbour objections, pole ownership, electrical connections, etc., that will stop the process from being fully automated. If this Terragraph system is aimed at second or third tier cities in developing countries, there is also the issue of chaotic development all around. In the shanty towns surrounding these big conurbations, legal niceties such as property boundaries and rights-of-way can be quite murky. Terragraph could be quite useful in bringing low-cost Internet into these areas, but it probably does not need fancy AI to optimise each small cell placement.

Generally speaking, 3G and now 4G services have arrived in most cities worldwide. The presumption is that Facebook is not seeking to become its own mobile carrier in developing countries but that it would partner with existing operators to augment their networks. Meanwhile one suspects that the reason carriers have been slow to upgrade capacity is certain neighbourhoods or cities is more economic than technical. It is probably not a lack of spectrum that is holding them back, nor a lack of viable femtocell products or microwave backlinks, but simply a lack of financial capital or a weak return on investment, or red tape. One reason for this that is often cited is that over-the-top services, such as Facebook, suck all the value out of the network, leaving the mobile operator with very thin margins and little customer stickiness.


Part 2 of this article we will look at Facebook's Aquila and Tether-tenna concepts.

Friday, May 5, 2017

Facebook's march to augmented reality

The big theme coming out of Facebook's recent F8 Developer Conference in San Jose, California was augmented reality (AR). Mark Zuckerberg told the audience that the human desire for community has been weakened over time and believes that social media could play a role in strengthening these ties.

Augmented reality begins as an add-on to Facebook Stories, its answer to Snapchat. Users simply take a photo and then use the app to place an overlay on top of the image, such as a silly hat, fake moustache, while funky filters keep the users engaged and help them create a unique image. Over time, the filter suggestions become increasingly smart, adapting to the content in the photo - think of a perfect frame if the photo is of the Eiffel Tower. The idea is to make the messaging more fun. In addition, geo-location data might be carried to the FB data centre to enhance the intelligence of the application, but most of the processing can happen on the device.

Many observers saw Facebook's demos as simply a needed response to Snapchat. However, Facebook is serious about pushing this concept far beyond cute visual effects for photos and video. AR and VR are key principles for what Facebook believes is the future of communications and community building.

As a thought experiment, one can consider some of the networking implications of real-time AR. In the Facebook demonstration, a user turns on the video chat application on their smartphone. While the application parameters of this demonstration are not known, the latest smartphones can record in 4K at 30 frames per second, and will soon be even sharper and faster. Apple's Facetime requires about 1 Mbit/s for HD resolution and this has been common for several years (video at 720p and 30 fps). AR certainly will benefit from high resolution, so one can estimate the video stream leaves the smart phone on a 4 Mbit/s link (this guestimate is on the low end). The website www.livestream.com calculates a minimum of 5 Mbit/s upstream bandwidth for launching a video stream with high to medium resolution. LTE-Advanced networks are capable of delivering 4 Mbit/s upstream, with plenty of headroom, and WiFi networks are even better.

To identify people, places and things in the video, Facebook will have to perform sophisticated graphical processing with machine learning. Currently this cannot be done locally by the app on the smartphone and so will need to be done at a Facebook data centre. So the 4 Mbit/s stream will have to leave the carrier network and be routed to the nearest Facebook data centre.

It is known from previous Open Compute Project (OCP) announcements that Facebook is building its own AI-ready compute clusters. The first design, called Big Sur, is an Open Rack-compatible chassis that incorporates eight high-performance GPUs of up to 300 watts each, with the flexibility to configure between multiple PCI-e topologies. It uses NVIDIA's Tesla accelerated computing platform. This design was announced in late 2015 and subsequently deployed in Facebook data centres to support its early work in AI. In March, Facebook unveiled Big Basin, its next-gen GPU server capable of machine learning models that are 30% bigger than those handled on Big Sur using greater arithmetic throughput and a memory increase from 12 to 16 Gbytes. The new chassis also allows for disaggregation of CPU compute from the GPUs, something that Facebook calls JBOG (just a bunch of GPUs), which should bring the benefits of virtualisation when many streams need to be processed simultaneously. The engineering has anticipated that increased PCIe bandwidth will be needed between the GPUs and the CPU head nodes, hence a new Tioga Pass server platform was also necessitated.

The Tioga Pass server features a dual-socket motherboard, with DIMMs on both PCB sides for maximum memory configuration. The PCIe slot has been upgraded from x24 to x32, which allows for two x16 slots, or one x16 slot and two x8 slots, to make the server more flexible as the head node for the Big Basin JBOG. This new hardware will need to be deployed at scale in Facebook data centres. Therefore, one can envision that the video stream originates at 4 Mbit/s and travels from the user's smartphone and is routed via the mobile operator to the nearest Facebook data centre.

Machine learning processes running on the GPU servers perform what Facebook terms Simultaneous Localisation and Mapping (SLAM). The AI essentially identifies the three-dimensional space of the video and the objects or people within it. The demo showed a number of 3D effects being applied to video stream, such as lighting/shading, placement of other objects or text. Once this processing has been completed, the output stream must continue to its destination, the other participants on the video call. Maybe further encoding has compressed the stream, but still Facebook will have to be burning some amount of outbound bandwidth to hand the video stream over to another mobile operator for delivery via IP to the app on the recipient's smartphone. Most likely, the recipient(s) of the call will have their video cameras turned on and these streams will also need the same AR processing in the reverse direction. Therefore, we can foresee see a two-way AR video call burning tens of mgeabits of WAN capacity to/from the Facebook data centre.

The question of scalability

Facebook does not charge users for accessing any of its services, which generally roll out across the entire platform at one go or in a rapid series of upgrade steps. Furthermore, Facebook often reminds us that it is now serving a billion users worldwide. So clearly, it must be thinking about AR on a massive scale. When Facebook first began serving videos from its own servers, the scalability question was also raised, but this test was passed successfully thanks to the power of caching and CDNs. When Facebook Live began rolling, it also seemed like a stretch that it could work at global scale. Yet now there are very successful Facebook video services.

Mobile operators should be able to handle large numbers of Facebook users engaging in 4 Mbit/s upstream connections, but each of those 4 Mbit/s streams will have to make a visit to the FB data centre for processing. Fifty users will burn 200 Mbit/s of inbound capacity to the data centre, 500 users will eat up 2 Gbit/s of bandwidth, 5,000 20 Gbit/s and 50,000 200 Gbit/s. For mobile operators, if AR chats prove to be popular lots of traffic will be moving in and out of Facebook data centres, and one could easily envision a big carrier like Verizon or Sprint having more than 500,000 simultaneous users on Facebook AR. So this would present a challenge if 10 million users decide to try this out on a Sunday evening. That would demand a lot of bandwidth that network engineers would have to find a way to support. Another point is that, from experience with other chat applications, people are no longer accustomed to economising in terms of length of the call or number of participants. One can expect many users to kick-off a Facebook AR call with friends on another continent and keep the stream opened for hours.

Of course, there could be clever compression algorithms in play so that the 4 Mbit/s at each end of the connection could be reduced, while if the participants do not move from where they are calling and nothing changes in the background, perhaps the AR can snooze, reducing the amount of processing needed and the bandwidth load. In addition, perhaps some of the AR processing can be done on next gen smartphones. However, the opposite could also be true, where AR performance is enhanced by using 4K, multiple cameras per user are used on the handset for better depth perception, and the video runs at 60 fps or faster.

Augmented reality is so new that it is not yet known whether it will take off quickly or be dismissed as a fad. Maybe it will only make sense in narrow applications. In addition, by the time AR calling is ready for mass deployment, Facebook will have more data centres in operation with a lot more DWDM to provide its massive optical transport – for example the MAREA submarine cable across the Atlantic Ocean between Virginia and Spain, which Facebook announced last year in partnership with Microsoft. The MAREA cable, which will be managed by Telxius, Telef√≥nica’s new infrastructure company, will feature eight fibre pairs and an initial estimated design capacity of 160 Tbit/s. So what will fill all that bandwidth? Perhaps AR video calls, but the question then is, will metro and regional networks be ready?

Wednesday, April 26, 2017

Orange Teams with Facebook on Start-up Accelerator

Global telco Orange announced that, as a member of the Telecom Infra Project (TIP) and together with Facebook, it is launching the Orange Fab France Telecom Track accelerator, designed to support start-ups focused on network infrastructure development.

Through the initiative, selected start-ups will be mentored by Orange and provided with access to its global resources, as well as support from TIP Ecosystem Accelerator Centres (TEAC) and Facebook.

As part of the initiative, Orange is working with TIP and Facebook to identify and support start-ups focused on network infrastructure technology with the launch of the new Telecom Track as part of its Orange Fab accelerator program in France. The partnership will aim to identify the best innovations and talent within the sector and provide start-ups with support and guidance from experts at Orange, TIP and Facebook, as well as facilitate collaboration and investment opportunities.

The project will be managed through Orange Fab France, Orange's established accelerator program for start-ups located at the Orange Gardens campus in Paris that is dedicated to R&D. The program also has the support of Orange Digital Ventures. By engaging with experts from Orange and its partners, start-ups will be provided with support in tackling network-related issues such as network management and access technologies.

Start-ups selected for the program will receive the benefits offered as part of the existing Orange Fab program, including the opportunity to participate in dedicated workshops, mentoring sessions with specialists and an optional Euro 15,000 in funding. They will also be provided with work space at the Orange Gardens, where the company's R&D teams are based. Start-ups will also have access to experts from the TIP community, TEAC and Facebook.

Orange has launched a call for projects to French start-ups that runs until May 14th; following evaluation of submissions, start-ups will be selected to join the acceleration program and can present at a launch event planned for June that will attended by Orange, TIP and Facebook executives, as well as partners and venture capitalists.

Friday, March 24, 2017

Microsoft's Project Olympus provides an opening for ARM

A key observation from this year's Open Compute Summit is that the hyper-scale cloud vendors are indeed calling the shots in terms of hardware design for their data centres. This extends all the way from the chassis configurations to storage, networking, protocol stacks and now customised silicon.

To recap, Facebook's newly refreshed server line-up now has 7 models, each optimised for different workloads: Type 1 (Web); Type 2 - Flash (database); Type 3 – HDD (database); Type 4 (Hadoop); Type 5 (photos); Type 6 (multi-service); and Type 7 (cold storage). Racks of these servers are populated with a ToR switch followed by sleds with either the compute or storage resources.

In comparison, Microsoft, which was also a keynote presenter at this year's OCP Summit, is taking a slightly different approach with its Project Olympus universal server. Here the idea is also to reduce the cost and complexity of its Azure rollout in hyper-scale date centres around the world, but to do so using a universal server platform design. Project Olympus uses either a 1 RU or 2 RU chassis and various modules for adapting the server for various workloads or electrical inputs. Significantly, it is the first OCP server to support both Intel and ARM-based CPUs. 

Not surprisingly, Intel is looking to continue its role as the mainstay CPU supplier for data centre servers. Project Olympus will use the next generation Intel Xeon processors, code-named Skylake, and with its new FPGA capability in-house, Intel is sure to supply more silicon accelerators for Azure data centres. Jason Waxman, GM of Intel's Data Center Group, showed off a prototype Project Olympus server integrating Arria 10 FPGAs. Meanwhile, in a keynote presentation, Microsoft Distinguished Engineer Leendert van Doorn confirmed that ARM processors are now part of Project Olympus.

Microsoft showed Olympus versions running Windows server on Cavium's ThunderX2 and Qualcomm's 10 nm Centriq 2400, which offers 48 cores. AMD is another CPU partner for Olympus with its ARM-based processor, code-named Naples.  In addition, there are other ARM licensees waiting in the wings with designs aimed at data centres, including MACOM (AppliedMicro's X-Gene 3 processor) and Nephos, a spin-out from MediaTek. For Cavium and Qualcomm, the case for ARM-powered servers comes down to optimised performance for certain workloads, and in OCP Summit presentations, both companies cited web indexing and search as one of the first applications that Microsoft is using to test their processors.

Project Olympus is also putting forward an OCP design aimed at accelerating AI in its next-gen cloud infrastructure. Microsoft, together with NVIDIA and Ingrasys, is proposing a hyper-scale GPU accelerator chassis for AI. The design, code named HGX-1, will package eight of NVIDIA's latest Pascal GPUs connected via NVIDIA’s NVLink technology. The NVLink technology can scale to provide extremely high connectivity between as many as 32 GPUs - conceivably 4 HGX-1 boxes linked as one. A standardised AI chassis would enable Microsoft to rapidly rollout the same technology to all of its Azure data centres worldwide.

In tests published a few months ago, NVIDIA said its earlier DGX-1 server, which uses Pascal-powered Tesla P100 GPUs and an NVLink implementation, were delivering 170x of the performance of standard Xeon E5 CPUs when running Microsoft’s Cognitive Toolkit.

Meanwhile, Intel has introduced the second generation of its Rack Scale Design for OCP. This brings improvements in the management software for integrating OCP systems in a hyper-scale data centre and also adds open APIs to the Snap open source telemetry framework so that other partners can contribute to the management of each rack as an integrated system. This concept of easier data centre management was illustrated in an OCP keynote by Yahoo Japan, which amazingly delivers 62 billion page views per day to its users and remains the most popular website in that nation. The Yahoo Japan presentation focused on an OCP-compliant data centre it operates in the state of Washington, its only overseas data centre. The remote data centre facility is manned by only a skeleton crew that through streamlined OCP designs is able to perform most hardware maintenance tasks, such as replacing a disk drive, memory module or CPU, in less than two minutes.

One further note on Intel’s OCP efforts relates to its 100 Gbit/s CWDM4 silicon photonics modules, which it states are ramping up in shipment volume. These are lower cost 100 Gbit/s optical interfaces that run over up to 2 km for cross data centre connectivity.

On the OCP-compliant storage front not everything is flash, with spinning HDDs still in play. Seagate has recently announced a 12 Tbytes 3.5 HDD engineered to accommodate 550 Tbyte workloads annually. The company claims MTBF (mean time between failure) of 2.5 million hours and the drive is designed to operate 24/7 for five years. These 12 Tbyte enable a single 42 U rack to deploy over 10 Pbytes of storage, quite an amazing density considering how much bandwidth would be required to move this volume of data.


Google did not make a keynote appearance at this year’s OCP Summit, but had its own event underway in nearby San Francisco. The Google Cloud Next event gave the company an even bigger stage to present its vision for cloud services and the infrastructure needed to support it.

See also