What
Apache Beam (https://beam.apache.org/) is an open-source project for writing big-data pipelines.
In
the first part of this talk, I’ll describe Beam from a non-technical
perspective – what it is, why you would use it, how it compares to other
technologies in the big data space.
In the second half of the
talk I will go into a high-level overview of the technical aspects of
Beam. In particular, its heart is a programming model that unifies both
batch and stream processing, allowing the programmer to separate the
what, where, when, and how of processing. What actual processing is
performed on the data. Where in event time is that processing done – how
are event times windowed. When in processing time to materialise
results. How are updates of results (due e.g. to late data) combined.
Beam also provides several language-specific SDKs that instantiate the
model for particular languages. Currently Java and Python are available
and Go is under development.
Beam also provides a portability
framework that allows pipelines to be run on a variety of execution
technologies. Beam itself provides a reference runner. There are also
efforts to develop runners based on Apache Flink and Apache Spark.
Google provides a commercial managed runner on its Google Cloud. Beam
builds on the work of Map Reduce, Hadoop, Flume, Spark, and Flink.
Speaker Bio
Neal Glew is a software engineer in the Flume project at Google, where he mostly works on the shuffle system. He previously worked at Intel on parallel programming models within Intel Labs. He has a PhD in computer science from Cornell University and a BSc(hons) in computer science from Victoria University of Wellington.
Data Driven Wellington Meetup Group
There’s so much going on in the world of data that it can be hard to keep up with what’s happening in your own speciality area let alone make connections to others who might have complementary skills or interests. This Meetup is intended to make it easier to stay informed and to make those connections. Its focus is on what people working with data in Wellington-based public, private, non-profit, and academic organisations are doing, what challenges they’re experiencing and what they need help with. It welcomes members who spend their days capturing, storing, manipulating and analysing data as well as those who use data generated by others for decision- and policy-making.
When and Where
Thursday, 14 March 2019
5:30pm – 7:30pm
Rutherford House
23 Lambton Quay
Wellington
Rutherford house is the tall building between the Beehive, Railway Station, and Old Government Building; The Meetup will be in VicBooks Cafe, on the Bunny Street side of the ground floor.
How Much
Free
More
Writing Big Data Pipelines: the Apache Beam Project
Data Driven Wellington Meetup