SBTB 2014, Tathagata Das: Large scale, real-time stream processing using Spark Streaming

Title:

Description:

Spark Streaming is a extension to the Spark cluster computing framework that enables high-speed, fault-tolerant stream processing through a high-level Scala API. It builds on a new execution model called "discretized streams" to provide exactly-once processing without the heavy cost of transactions required by previous systems (e.g. Storm), allowing it to process significantly higher rates of data per node while still recovering from faults in seconds. It also greatly simplifies stream programming by providing a set of functional, high-level operators (e.g. maps, filters, and windows) in Scala. Perhaps the most exciting feature of Spark Streaming, however, is that it combines seamlessly with Spark's interactive and batch processing features, allowing ad-hoc queries on stream state and programs that combine streaming and historical data to do online machine learning and graph processing. Spark Streaming scales linearly to 100 nodes and has been used to build applications including session-level metrics reporting and online machine learning.

TD is a graduate student at AMPLab, and is the original author of Spark Streaming. ----------------------------------------------------------------------------------------------------------------------------------------

Scalæ By the Bay 2016 conference

http://scala.bythebay.io

-- is held on November 11-13, 2016 at Twitter, San Francisco, to share the best practices in building data pipelines with three tracks:

* Functional and Type-safe Programming
* Reactive Microservices and Streaming Architectures
* Data Pipelines for Machine Learning and AI

YouTube url:	https://www.youtube.com/watch?v=mjXAa9Xrq6U&t=0s
Created:	10. 2. 2017 21:34:12

CaptionsMaker.com

SBTB 2014, Tathagata Das: Large scale, real-time stream processing using Spark Streaming