Today Google Cloud Platform announced the general availability of two new data analytics products, Google Cloud Dataflow and Google Cloud Pub/Sub. These products will provide a unified programming model for data analysis, removing the need for batch and stream data sources while integrating applications and services and analyzes data streams in real time. Google is also announcing the release of Cloudera Director 1.5.
Today Google Cloud Platform announced the general availability of two new data analytics products, Google Cloud Dataflow and Google Cloud Pub/Sub. These products will provide a unified programming model for data analysis, removing the need for batch and stream data sources while integrating applications and services and analyzes data streams in real time. Google is also announcing the release of Cloudera Director 1.5.
Google has finished adding their entire line of big data tools with the general availability of Google Cloud Dataflow and Google Cloud Pub/Sub, the first step being BigQuery. Google is removing the beta labels and expects thousands of terabytes of data to be analyze with these new products right away. Google Cloud Dataflow and Google Cloud Pub/Sub will enable data processing without the operational burden typically found in such systems. They also enable customers to build applications on a platform that scales with their needs and does so with low latency and high reliability.
Google Cloud Dataflow removes the complexity of developing separate systems for batch and streaming data sources by providing a unified programming model. Cloud Dataflow eliminates the overhead related to large-scale cluster management and optimization.
Benefits include:
- A fully managed, fault tolerant, highly available, SLA-backed service for batch and stream processing.
- A comprehensive model for balancing correctness, latency, and cost when dealing with unordered data at massive scale. These concepts power key elements of the Cloud Dataflow programming model.
- Cloud Dataflow is 2-3x faster and cheaper than Hadoop when evaluating classic MapReduce based pipelines, such as PageRank and WordCount. And with dynamic work rebalancing, Cloud Dataflow effectively optimizes resource utilization that provides additional performance gains without requiring manual intervention.
- An extensible SDK. Google has expanded its technology partner, 3rd party connector, and service provider integration efforts including Tamr, Salesforce, Clearstory, springML, Cloudera, data Artisans. Google also continues to support alternate runner enablement for Apache Spark and Apache Flink.
- Native Google Cloud Platform integration for Cloud Storage, Cloud Datastore, BigQuery, and Cloud Pub/Sub.
Google Cloud Pub/Sub is the result of a decade of innovation. It can integrate applications and services reliably, as well as analyze big data streams in real-time. Cloud Pub/Sub delivered over 1 trillion messages during its alpha and beta run helping to fine-tune its performance. It can also address a wide range of scenarios from a single API.
Availability
Both Google Cloud Dataflow and Google Cloud Pub/Sub are available today.
Sign up for the StorageReview newsletter