What I learnt today — Lambda and Airflow

Ed Springer
1 min readMar 8, 2019

--

Thanks to the great article by Insight Data.

My takeaways:

Lambda architecture enables real time handling of data and data integrity. It is horizontally scalable and assumes that things will go wrong.

Real-time data is handled as is and updated as required by a batch process that builds the data from the beginning. Typically there are three tables — one that records the real time data as it arrives, one that stores the results of the last batch process and one that stores only the delta of the values that changed since the last batch run.

Airflow is a workflow engine for authoring, scheduling and monitoring batch processes (daily jobs, hourly jobs). It can, however, run jobs at one minute or 5-minute intervals.

Now am curious about:

- How costly (infrastructure, productivity) is it for a batch process to run?

- What is Plan B when the batch process crashes?

Did I get that right? Or is there a simpler way to understand this?

--

--

Ed Springer
Ed Springer

Written by Ed Springer

Dad. Husband. Friend. Mate.Son. Curious about the business of tech. Passionate about photography. Student of life.

No responses yet