Until recently, it was difficult to automate complicated data pipelines using Google Analytics 4. That has changed with Dataform, a component of the Google Cloud Platform that allows automating data pipelines.
Dataform is an open-source platform that was bought by Google in 2020. Dataform appeared in Google Cloud Plaform a few months ago, and its purpose is still the same.
Dataform helps data specialists or teams build data pipelines, version code in Git, and orchestrate workflows.
What are the advantages of Dataform?
The creation of data pipelines
If you are pretty serious about data, you can’t avoid creating data pipelines. Without Dataform, creating such pipelines is complex and skeletal because you’ll probably reach for scheduled queries. You will have to manually set all dependencies according to time, and this will be time-consuming in the case of more complex pipelines.
In the case of Dataform, you just need to define individual dependencies between tables, and you are done.
The structure of data is constantly changing in our hands, so as data specialists, we have to change our code regularly. For this purpose, there is another great feature of Dataform. This is version control.
With this, for example, you can have the same processes set up as the developers have before releasing a change or new code.
Google Cloud Platform ecosystem
Dataform’s major advantage is that it’s already part of GCP, which means we can use other services there.
Specifically, for example, you can use Cloud Router, Pub/Sub, and Workflow to trigger a Dataform calculation when an update or new table is created in BigQuery.
Using Dataform is free. You will only pay for the amount of data processed.
What is the alternative to Dataform?
The biggest alternative to Dataform at the moment is DBT, which is especially popular in the data mining community. The reason is given by the fact that it supports SQL and Python.
Both tools have, so to speak, identical functionalities. The only difference is that DBT is a SaaS tool, i.e., it charges a fee if you want to add more than one person to it.
Our Dataform package
In Optimics, we believe in contributing to the community, so that is why we have created an open-source Dataform package. This package processes Google Analytics 4 e-commerce data in BigQuery into a flat table. This table can then be used as a data source for data visualization in Looker Studio or another visualization tool.
In addition to flat table creation, the daily increment of data is buried here, so the package will only count data for the previous day.
If you are interested in this package, check out our GitHub, for a description of the installation process.