Data Pipeline

Our world is pretty complex, but we are always looking for ways to better understand and comprehend it. My interests lie in seeing if we can put a process to understand the vast amounts of data which we as humanity are now collecting. We will be looking at high level data and seeing if we can create some inferences from that data. In the process we will be exploring how we can create pipelines to ingest and interpret this data.

Ingesting Data

Data ingestion or ETL is how we get data into our system. This is not trivial and can be a major source of headaches. The data which we will be getting from public data sources are very rarely in the same formats and need to be transformed in order to be useful to us.

Single Stage Pipeline

Multi-stage Pipeline

Further Reading