Showing posts with label Databricks. Show all posts
Showing posts with label Databricks. Show all posts

Thursday, December 15, 2016

Databricks Raises $60 Million for Cloud Data Analytics with Apache Spark

Databricks, a start-up founded by the team that created Apache Spark, announced $60 million in a Series C funding.

Databricks offers a data platform in the cloud powered by Apache Spark. The company said that as Spark's adoption moves into mainstream in large data-driven enterprises in all industries, it has seen an explosive uptick in customer demand and adoption, serving more than 400 customers today.

The funding round was led by New Enterprise Associates (NEA) and included existing Databricks investor, Andreessen Horowitz. The new funding round brings Databricks' total funding to date to $107.5 million.

"Apache Spark has enabled countless enterprises and cutting-edge early adopters to create business value through advanced analytics solutions," said Ali Ghodsi, CEO and Co-Founder at Databricks. "As Spark's adoption and the demand for our managed Spark platform continues to rise, this funding will advance our engineering and go-to-market strategies to address all of our customer's pain points as we continue to grow the Spark community."

http://www.databricks.com

Friday, September 25, 2015

Databricks: Apache Spark Outgrowing Hadoop

The number of standalone deployments of Spark eclipses those on YARN as more users run Spark independent of Hadoop, according to a newly published survey of Spark users conducted by Databricks, the company founded by the creators of Apache Spark.

Databricks said that users that are running Spark in standalone (48 percent of respondents) exceeds those running Spark on YARN (40 percent of respondents), alongside a majority of users running Spark in the public cloud. The survey also found that 51 percent of respondents run Spark on a public cloud.

Key findings from the survey include:

  • Spark is outgrowing Hadoop: The most common Spark deployments according to the community are: 48 percent standalone, 40 percent YARN within Hadoop and 11 percent Apache Mesos. Spark users who do not use any Hadoop components have more than doubled in 2015 (from 2014). 
  • Streaming and advanced analytics uses rising: Spark is being used for an increasingly diverse set of applications, particularly data scientists for machine learning, streaming and graph analysis use cases. In 2015, there are 56 percent more Spark streaming users than in 2014. The production use of advanced analytics, like MLib for machine learning and GraphX for graph processing, increased from 11 percent in 2014 to 15 percent in 2015. 75 percent of Spark users are also using two or more Spark components (51 percent of Spark users are using three or more Spark components).
  • Spark users are becoming more diverse:  Of those surveyed, 41 percent identified themselves as Data Engineers, while 22 percent of respondents identified themselves as Data Scientists. Spark users are solving a variety of problems in different languages -- Scala (71 percent), Python (58 percent), SQL (36 percent), Java (31 percent) and R (18 percent) -- and all within the same framework.
  • Spark's most popular use cases come to light: Fifty-two percent use Spark for data warehousing, 68 percent use it for business intelligence, 40 percent for processing application and system logs, 48 percent to build recommendation engines, 36 percent for user-facing services and 29 percent for fraud detection and security.
  • Spark is increasing access to big data:  Ninety one percent of those surveyed claim performance as their reason for adoption, while 77 percent cite ease of programming, 71 percent cite ease of deployment, 64 percent cite advanced analytics capabilities and 52 percent cite real-time streaming capabilities.

"The continued growth of Spark has been highly encouraging, as companies are going into production to obtain real business value, and they are doing so in a wide range of environments beyond Hadoop clusters," said Matei Zaharia, creator of Apache Spark and CTO of Databricks. "Databricks and our partners are 100 percent committed to the long-term growth of Spark and we'll continue to make improvements based on this survey data and our ongoing community feedback, to make the most complete big data analytics toolkit accessible to all businesses."

https://databricks.com

Wednesday, July 2, 2014

Support Grows for Apache Spark in Big Data Streaming

Cloudera, Databricks, IBM, Intel, and MapR announced their collaboration to collectively broaden the range of tools and technologies in the Hadoop ecosystem that leverage Apache Spark as an underlying processing engine.

Apache Spark is an open-source data analytics cluster computing framework that promises to run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.

The companies said their new collaboration expands upon the Spark momentum to include several key Hadoop projects - starting with the Apache Hive SQL engine (Hive). Using Spark as the underlying execution engine, this effort will improve the performance of batch SQL jobs in Hive, while seamlessly maintaining compatibility with the core Hive code base.  The companies are also investigating ways to adapt Apache Pig to leverage Spark, as well as other popular tools, such as Sqoop and Search.

http://www.cloudera.com/content/cloudera/en/about/press-center/press-releases/2014/07/01/community-effort-driving-standardization-of-apache-spark-through.html

http://spark.apache.org/

See also