In this post I will assume that you have already started playing with Google Cloud Workflows, you liked it so much that its reference documentation have no more secret for you.
Please notice that every sentence I quoted below, was a copy paste from that documentation.
One of Google Workflows useful architecture patterns, is handling long-running jobs and polling for status. It’s well explained with 2 others patterns on Google Cloud Blog by Workflows Product Manager, here.
A typical use case for this pattern: a BigQuery Job status polling, where we:
Our data engineering pipelines load data, transform it, and prepared it. All the prepared data is stored in one dataset in BigQuery hosted by one GCP project. Nothing fancy here!
We have 2 groups of users of this prepared data : data scientists, and data analysts. Each group belongs to a different business unit using their own GCP project, different from ours.
Although we are excited 🤩 😍 to provide this prepared data for these groups, we have to pay attention to 2 points :
I love Cloud Dataprep : First, it offers a large set of transformations on your raw data without the need to write “any line of code”. Then it integrates well with Google Cloud. In fact Cloud Dataprep jobs run as Cloud Dataflow jobs that read / write from / to Google Cloud Storage and / or Google BigQuery.
With Cloud Dataprep you create a flow. A flow design starts by importing one or many datasets. A dataset can be a local file uploaded to Cloud Dataprep, a file in Cloud Storage or a table in BigQuery. Then you add one…
During this hackathon, candidates would enjoy working and discovering Google Cloud Services to implement their solutions. As organizers (Umanis), we offered a budget of 500 EU for each project. A budget is usable for 2 months, the duration of the hackathon. Google Cloud, our partner, offered a limited credit that is attachable to one billing account. …