Showing posts with label Spark. Show all posts
Showing posts with label Spark. Show all posts

Tuesday, July 21, 2020

New Zealand's Spark builds OTN 2 with Ciena

Spark has completed the first stage of its next generation Optical Transport Network (OTN 2) using equipment, software and services from Ciena.

The OTN 2 rollout is a two-year project, which has started in Auckland, and will expand towards Hamilton, Wellington and Christchurch. Initially, the new OTN equipment will be an express overlay to the existing core network, then will eventually replace the existing OTN.

The network, which features ‘self healing’ capabilities to automatically restore services after things like natural disasters, is now live and operating at 800 Gbps. Spark’s previous links operated at 100 or 200 Gbps. The rollout includes Ciena’s 6500 flexible grid colorless, directionless and contentionless (CDC) photonic line system with advanced control plane capabilities, WaveLogic 5 Extreme coherent optics, and Manage, Control and Plan domain controller with Liquid Spectrum analytics.

Campbell Fraser, Spark’s Technology Tribe Lead said that the roll out of OTN 2 will deliver increased resiliency enabling Spark to respond and restore service much faster after events such as the Kaikoura earthquakes.

“The ‘self-healing’ technology, which we believe is a first for New Zealand, will minimise the impact of network outages. These are caused by cuts in the fibre cable from earthquakes, floods, landslips, construction works or rodents damaging cables. Currently, restoring service is a manual process but the sharp growth in network traffic means manual restoration is becoming unmanageable. The optical restoration ‘self-healing’ technology allows the light signals that carry the data to automatically change their path after a fibre cut, so this is a big step forward. We expect to be able to restore services much more quickly so we can get customers back up and running.”

“A self-healing and resilient network that can automatically fine tune capacity and dynamically adapt to evolving user demands and unexpected fibre cuts or natural disasters is critical in today’s digital-first environment,” said Rick Seeto, Vice President and General Manager of Asia-Pacific and Japan, Ciena.

Thursday, December 15, 2016

Databricks Raises $60 Million for Cloud Data Analytics with Apache Spark

Databricks, a start-up founded by the team that created Apache Spark, announced $60 million in a Series C funding.

Databricks offers a data platform in the cloud powered by Apache Spark. The company said that as Spark's adoption moves into mainstream in large data-driven enterprises in all industries, it has seen an explosive uptick in customer demand and adoption, serving more than 400 customers today.

The funding round was led by New Enterprise Associates (NEA) and included existing Databricks investor, Andreessen Horowitz. The new funding round brings Databricks' total funding to date to $107.5 million.

"Apache Spark has enabled countless enterprises and cutting-edge early adopters to create business value through advanced analytics solutions," said Ali Ghodsi, CEO and Co-Founder at Databricks. "As Spark's adoption and the demand for our managed Spark platform continues to rise, this funding will advance our engineering and go-to-market strategies to address all of our customer's pain points as we continue to grow the Spark community."

http://www.databricks.com

Thursday, April 28, 2016

Levyx Raises $5.4 Million for Big Data Store

Levyx, a start-up based in Irvine, California, announced $5.4 million in Series-A funding for its high-performance processing technology for reducing infrastructure costs associated with big-data applications.

The funding was led by Chicago-based OCA Ventures. Additional investors include Amino Capital (a.k.a. zPark Capital) and Sumavision USA Corporation, as well as individual investors.

Levyx said its "Helium" data engine is built for the modern "open-platform" commodity-hardware datacenter.  "Helium is an ultra-low latency datastore that can process tens of millions of queries per second on a single computing node. By leveraging Helium and its core expertise system software and SSD/NVM technology, Levyx enables its customers to achieve in-memory computing performance at a fraction of the normal cost by using Flash-SSDs (versus much more expensive DRAM) and running on commodity hardware. Levyx’s patent-pending Input/Output software and indexing algorithms take advantage of multicore architectures and flash memory and is also designed to optimize emerging NVM technologies."

“We have seen a huge amount of innovation in the software and storage hardware associated with big-data applications, but there are big inefficiencies because the two sides have been walled off from one another,” CEO and Founder Reza Sadri said. “By fixing this disconnect with a fundamentally new software stack, we pave the way for real-time processing of big-data workloads for the masses. The support and guidance of investors like OCA Ventures, Amino Capital and Sumavision will help us in our quest to make big-data applications dramatically more affordable for everyone.”

http://www.levyx.com

Monday, October 26, 2015

IBM Launches Apache Spark-as-a-Service

IBM is launching a Spark-as-a-Service offering on Bluemix following a successful 13-week Beta program with more than 4,600 developers using it to build intelligent business and consumer apps fueled by data.

IBM also confirmed that it has redesigned more than 15 of its core analytics and commerce solutions with Apache Spark.

Apache Spark was developed by the AMPLab at UC Berkeley as an open-source cluster computing framework. It offers in-memory processing and is known for its ease of use in creating algorithms that harness insight from complex data.

“For data scientists and engineers who want to do more with their data, the power and appeal of open source innovation for technologies like Spark is undeniable,” said Rob Thomas, Vice President of Product Development, IBM Analytics. “IBM is committed to using Spark as the foundation for its industry-leading analytics platform, and by offering a fully managed Spark service on IBM Bluemix, data professionals can access and analyze their data faster than ever before, with significantly reduced complexity.”

http://www-03.ibm.com/press/us/en/pressrelease/47946.wss

Databricks: Apache Spark Outgrowing Hadoop


The number of standalone deployments of Spark eclipses those on YARN as more users run Spark independent of Hadoop, according to a newly published survey of Spark users conducted by Databricks, the company founded by the creators of Apache Spark. Databricks said that users that are running Spark in standalone (48 percent of respondents) exceeds those running Spark on YARN (40 percent of respondents), alongside a majority of users running Spark in...

Google Cloud Dataproc Brings Fast Hadoop & Spark Cluster Provisioning


Google introduced new capabilities for managing clusters of Hadoop and Spark. Google Cloud Dataproc, which is now in beta,  is a managed Spark and Hadoop service that leverages open source data tools for batch processing, querying, streaming, and machine learning. The service can be used to create and manage clusters ranging in size from 3 to hundreds of nodes. Google said its Cloud Dataproc can create Spark and Hadoop clusters in 90 seconds...

IBM Backs Apache Spark for Cloud Data Processing


IBM is putting its weight behind Apache Spark, which is an open source engine for large-scale data processing and compatible with Hadoop data. Apache Spark can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like 

Friday, September 25, 2015

Databricks: Apache Spark Outgrowing Hadoop

The number of standalone deployments of Spark eclipses those on YARN as more users run Spark independent of Hadoop, according to a newly published survey of Spark users conducted by Databricks, the company founded by the creators of Apache Spark.

Databricks said that users that are running Spark in standalone (48 percent of respondents) exceeds those running Spark on YARN (40 percent of respondents), alongside a majority of users running Spark in the public cloud. The survey also found that 51 percent of respondents run Spark on a public cloud.

Key findings from the survey include:

  • Spark is outgrowing Hadoop: The most common Spark deployments according to the community are: 48 percent standalone, 40 percent YARN within Hadoop and 11 percent Apache Mesos. Spark users who do not use any Hadoop components have more than doubled in 2015 (from 2014). 
  • Streaming and advanced analytics uses rising: Spark is being used for an increasingly diverse set of applications, particularly data scientists for machine learning, streaming and graph analysis use cases. In 2015, there are 56 percent more Spark streaming users than in 2014. The production use of advanced analytics, like MLib for machine learning and GraphX for graph processing, increased from 11 percent in 2014 to 15 percent in 2015. 75 percent of Spark users are also using two or more Spark components (51 percent of Spark users are using three or more Spark components).
  • Spark users are becoming more diverse:  Of those surveyed, 41 percent identified themselves as Data Engineers, while 22 percent of respondents identified themselves as Data Scientists. Spark users are solving a variety of problems in different languages -- Scala (71 percent), Python (58 percent), SQL (36 percent), Java (31 percent) and R (18 percent) -- and all within the same framework.
  • Spark's most popular use cases come to light: Fifty-two percent use Spark for data warehousing, 68 percent use it for business intelligence, 40 percent for processing application and system logs, 48 percent to build recommendation engines, 36 percent for user-facing services and 29 percent for fraud detection and security.
  • Spark is increasing access to big data:  Ninety one percent of those surveyed claim performance as their reason for adoption, while 77 percent cite ease of programming, 71 percent cite ease of deployment, 64 percent cite advanced analytics capabilities and 52 percent cite real-time streaming capabilities.

"The continued growth of Spark has been highly encouraging, as companies are going into production to obtain real business value, and they are doing so in a wide range of environments beyond Hadoop clusters," said Matei Zaharia, creator of Apache Spark and CTO of Databricks. "Databricks and our partners are 100 percent committed to the long-term growth of Spark and we'll continue to make improvements based on this survey data and our ongoing community feedback, to make the most complete big data analytics toolkit accessible to all businesses."

https://databricks.com

Wednesday, September 23, 2015

Google Cloud Dataproc Brings Fast Hadoop & Spark Cluster Provisioning

Google introduced new capabilities for managing clusters of Hadoop and Spark.

Google Cloud Dataproc, which is now in beta,  is a managed Spark and Hadoop service that leverages open source data tools for batch processing, querying, streaming, and machine learning. The service can be used to create and manage clusters ranging in size from 3 to hundreds of nodes.

Google said its Cloud Dataproc can create Spark and Hadoop clusters in 90 seconds or less, compared to 5 to 30 minutes using on-premises or IaaS providers.

http://googlecloudplatform.blogspot.com/2015/09/Google-Cloud-Dataproc-Making-Spark-and-Hadoop-Easier-Faster-and-Cheaper.html

Monday, June 15, 2015

IBM Backs Apache Spark for Cloud Data Processing

IBM is putting its weight behind Apache Spark, which is an open source engine for large-scale data processing and compatible with Hadoop data.

Apache Spark can run in Hadoop clusters through YARN or Spark's standalone mode, and it can process data in HDFS, HBase, Cassandra, Hive, and any Hadoop InputFormat. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and machine learning.

IBM said Spark is potentially the most important new open source project in a decade that is being defined by data. As such, IBM plans to embed Spark into its Analytics and Commerce platforms, and to offer Spark as a service on IBM Cloud. The company said its will put more than 3,500 IBM researchers and developers to work on Spark-related projects at more than a dozen labs worldwide; donate its IBM SystemML machine learning technology to the Spark open source ecosystem; and educate more than one million data scientists and data engineers on Spark.

“IBM has been a decades long leader in open source innovation. We believe strongly in the power of open source as the basis to build value for clients, and are fully committed to Spark as a foundational technology platform for accelerating innovation and driving analytics across every business in a fundamental way,” said Beth Smith, General Manager, Analytics Platform, IBM Analytics. “Our clients will benefit as we help them embrace Spark to advance their own data strategies to drive business transformation and competitive differentiation.”

http://www.ibm.com

Wednesday, July 2, 2014

Support Grows for Apache Spark in Big Data Streaming

Cloudera, Databricks, IBM, Intel, and MapR announced their collaboration to collectively broaden the range of tools and technologies in the Hadoop ecosystem that leverage Apache Spark as an underlying processing engine.

Apache Spark is an open-source data analytics cluster computing framework that promises to run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

The companies said their new collaboration expands upon the Spark momentum to include several key Hadoop projects - starting with the Apache Hive SQL engine (Hive). Using Spark as the underlying execution engine, this effort will improve the performance of batch SQL jobs in Hive, while seamlessly maintaining compatibility with the core Hive code base.  The companies are also investigating ways to adapt Apache Pig to leverage Spark, as well as other popular tools, such as Sqoop and Search.

http://www.cloudera.com/content/cloudera/en/about/press-center/press-releases/2014/07/01/community-effort-driving-standardization-of-apache-spark-through.html

http://spark.apache.org/