Tuesday, July 1, 2014

Databricks Raises $33 Million for Apache Spark-based Cloud

Databricks, a start-up based in Berkeley, California, announced $33 million in series B funding and the launch of its cloud platform.

Databricks Cloud is powered by Spark, a unified processing engine with support for interactive queries (Spark SQL), streaming data (Spark Streaming), machine learning (MLlib) and graph computation (GraphX) natively.  A single API is used across the entire pipeline.

Databricks said its cloud platform benefits from the rapid pace of innovation in Spark, driven by the 200+ contributors that have made it the most active project in the Hadoop ecosystem.

The company's hosted platform simplifies the provisioning of a Spark cluster. Users simply specify the desired capacity of a new cluster, and the platform handles all the details: provisioning servers on the fly, streamlining import and caching of data, handling all elements of security, and continually patching and updating Spark—freeing users of all the typical headaches and allowing them to explore and harness the power of Spark.

Databricks Cloud is currently available on Amazon Web Services.  The company is looking to add cloud providers in the future.

The funding round was led by New Enterprise Associates (NEA) with follow-on investment from Andreessen Horowitz.

“Databricks remains committed to developing and expanding Apache Spark fully in the open and continuing to add to the capabilities that made it a vital big data platform,” said Matei Zaharia, CTO of Databricks. “We will continue to commit significant resources to drive open-source innovation in Spark alongside the community. Furthermore, we look forward to enabling a whole new set of users and developers to experience and leverage the power of Spark to drive enterprise value.”

  • Databricks is headed by Ion Stoica (CEO), who is a Professor of Computer Science at UC Berkeley and also co-founder of Conviva.  The Databricks' technical team is headed by Matei Zaharia (CTO), who is the creator of Apache Spark and an Assistant Professor of Computer Science at MIT.
  • Apache Spark, which is part of the Hadoop movement, is an open-source data analytics cluster computing framework originally developed in the AMPLab at U.C. Berkeley. It promises to run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.  http://spark.apache.org/