Skip to content

Commit ad3c347

Browse files
feat: Add BigQuery data warehouse example (#179)
Co-authored-by: bharathkkb <bharathkrishnakb@gmail.com>
1 parent 0440257 commit ad3c347

21 files changed

+1363
-2
lines changed

examples/data_warehouse/README.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Simple Example
2+
3+
This example illustrates how to use the `data_warehouse` module.
4+
5+
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
6+
## Inputs
7+
8+
| Name | Description | Type | Default | Required |
9+
|------|-------------|------|---------|:--------:|
10+
| project\_id | The ID of the project in which to provision resources. | `string` | n/a | yes |
11+
12+
## Outputs
13+
14+
| Name | Description |
15+
|------|-------------|
16+
| lookerstudio_report_url | The URL of the Looker Studio report to open. |
17+
18+
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
19+
20+
To provision this example, run the following from within this directory:
21+
- `terraform init` to get the plugins
22+
- `terraform plan` to see the infrastructure plan
23+
- `terraform apply` to apply the infrastructure build
24+
- `terraform destroy` to destroy the built infrastructure

examples/data_warehouse/main.tf

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/**
2+
* Copyright 2021 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
module "data_warehouse" {
18+
source = "../../modules/data_warehouse"
19+
20+
project_id = var.project_id
21+
region = "us-central1"
22+
}

examples/data_warehouse/outputs.tf

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/**
2+
* Copyright 2021 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
output "lookerstudio_report_url" {
18+
description = "The name of the dataset."
19+
value = module.data_warehouse.lookerstudio_report_url
20+
}
21+
22+
output "bigquery_editor_url" {
23+
description = "The name of the dataset."
24+
value = module.data_warehouse.bigquery_editor_url
25+
}

examples/data_warehouse/variables.tf

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/**
2+
* Copyright 2021 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
variable "project_id" {
18+
description = "The ID of the project in which to provision resources."
19+
type = string
20+
}

examples/data_warehouse/versions.tf

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/**
2+
* Copyright 2021 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
terraform {
18+
required_providers {
19+
google = {
20+
source = "hashicorp/google"
21+
version = "~> 4.52.0"
22+
}
23+
}
24+
required_version = ">= 0.13"
25+
}

modules/data_warehouse/.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.zip

modules/data_warehouse/README.md

+88
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# terraform-google-ssw
2+
3+
## Description
4+
### tagline
5+
This is an auto-generated module.
6+
7+
### detailed
8+
9+
The resources/services/activations/deletions that this module will create/trigger are:
10+
11+
- Creates a BQ Dataset
12+
- Creates a BQ Table
13+
- Creates a GCS bucket
14+
- Loads the GCS bucket with data from https://pantheon.corp.google.com/marketplace/product/city-of-new-york/nyc-tlc-trips
15+
- Provides SQL examples
16+
- Creates and inferences with a BigQuery ML model
17+
- Creates a datastudio report
18+
19+
### preDeploy
20+
To deploy this blueprint you must have an active billing account and billing permissions.
21+
22+
## Documentation
23+
- [Hosting a Static Website](https://cloud.google.com/storage/docs/hosting-static-website)
24+
25+
## Usage
26+
27+
Functional examples are included in the
28+
[examples](./examples/) directory.
29+
30+
<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
31+
## Inputs
32+
33+
| Name | Description | Type | Default | Required |
34+
|------|-------------|------|---------|:--------:|
35+
| deletion\_protection | Whether or not to protect GCS resources from deletion when solution is modified or changed. | `string` | `true` | no |
36+
| enable\_apis | Whether or not to enable underlying apis in this solution. | `string` | `true` | no |
37+
| force\_destroy | Whether or not to protect BigQuery resources from deletion when solution is modified or changed. | `string` | `false` | no |
38+
| labels | A map of labels to apply to contained resources. | `map(string)` | <pre>{<br> "edw-bigquery": true<br>}</pre> | no |
39+
| project\_id | Google Cloud Project ID | `string` | n/a | yes |
40+
| region | Google Cloud Region | `string` | n/a | yes |
41+
42+
## Outputs
43+
44+
| Name | Description |
45+
|------|-------------|
46+
| ds\_friendly\_name | Dataset name |
47+
| function\_uri | Function URI |
48+
| lookerstudio\_report\_url | The URL to create a new Looker Studio report displays a sample dashboard for the taxi data analysis |
49+
| bigquery\_editor\_url | The URL to launch the BigQuery editor with the sample query procedure opened |
50+
51+
<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
52+
53+
## Requirements
54+
55+
These sections describe requirements for using this module.
56+
57+
### Software
58+
59+
The following dependencies must be available:
60+
61+
- [Terraform][terraform] v0.13
62+
- [Terraform Provider for GCP][terraform-provider-gcp] plugin v3.0
63+
64+
### Service Account
65+
66+
A service account with the following roles must be used to provision
67+
the resources of this module:
68+
69+
- Storage Admin: `roles/storage.admin`
70+
71+
The [Project Factory module][project-factory-module] and the
72+
[IAM module][iam-module] may be used in combination to provision a
73+
service account with the necessary roles applied.
74+
75+
### APIs
76+
77+
A project with the following APIs enabled must be used to host the
78+
resources of this module:
79+
80+
- Google Cloud Storage JSON API: `storage-api.googleapis.com`
81+
82+
The [Project Factory module][project-factory-module] can be used to
83+
provision a project with the necessary APIs enabled.
84+
85+
86+
## Security Disclosures
87+
88+
Please see our [security disclosure process](./SECURITY.md).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Copyright 2021 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the 'License');
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an 'AS IS' BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# [START functions_cloudevent_storage]
16+
import functions_framework
17+
import os
18+
19+
20+
# Triggered by a change in a storage bucket
21+
@functions_framework.cloud_event
22+
def bq_sp_transform(cloud_event):
23+
24+
gcs_export_bq()
25+
26+
gcs_copy()
27+
28+
bq_one_time_sp()
29+
30+
31+
def bq_one_time_sp():
32+
33+
PROJECT_ID = os.environ.get("PROJECT_ID")
34+
35+
from google.cloud import bigquery
36+
37+
client = bigquery.Client()
38+
39+
query_string = f"""
40+
CALL `{PROJECT_ID}.ds_edw.sp_provision_lookup_tables`();
41+
CALL `{PROJECT_ID}.ds_edw.sp_lookerstudio_report`();
42+
CALL `{PROJECT_ID}.ds_edw.sp_bigqueryml_model`();
43+
"""
44+
query_job = client.query(query_string)
45+
46+
query_job.result()
47+
48+
49+
def gcs_export_bq():
50+
51+
from google.cloud import bigquery
52+
client = bigquery.Client()
53+
EXPORT_BUCKET_ID = os.environ.get("EXPORT_BUCKET_ID")
54+
55+
destination_uri = "gs://{}/{}".format(EXPORT_BUCKET_ID, "taxi-*.Parquet")
56+
job_config = bigquery.job.ExtractJobConfig()
57+
job_config.compression = bigquery.Compression.GZIP
58+
job_config.destination_format = "PARQUET"
59+
60+
extract_job = client.extract_table(
61+
'bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2022',
62+
destination_uri,
63+
# Location must match that of the source table.
64+
location="US",
65+
job_config=job_config,
66+
) # API request
67+
extract_job.result() # Waits for job to complete.
68+
69+
70+
def gcs_copy():
71+
72+
EXPORT_BUCKET_ID = os.environ.get("EXPORT_BUCKET_ID")
73+
BUCKET_ID = os.environ.get("BUCKET_ID")
74+
75+
from google.cloud import storage
76+
77+
storage_client = storage.Client()
78+
79+
source_bucket = storage_client.bucket(EXPORT_BUCKET_ID)
80+
destination_bucket = storage_client.bucket(BUCKET_ID)
81+
82+
blobs = storage_client.list_blobs(EXPORT_BUCKET_ID)
83+
84+
blob_list = []
85+
# Note: The call returns a response only when the iterator is consumed.
86+
for blob in blobs:
87+
blob_list.append(blob.name)
88+
print(blob.name)
89+
90+
for blob in blob_list:
91+
source_blob = source_bucket.blob(blob)
92+
93+
source_bucket.copy_blob(
94+
source_blob, destination_bucket, blob,
95+
)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
google-cloud-bigquery==3.5.0
2+
google-cloud-storage==2.7.0
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
-- Copyright 2023 Google LLC
2+
--
3+
-- Licensed under the Apache License, Version 2.0 (the "License");
4+
-- you may not use this file except in compliance with the License.
5+
-- You may obtain a copy of the License at
6+
--
7+
-- http://www.apache.org/licenses/LICENSE-2.0
8+
--
9+
-- Unless required by applicable law or agreed to in writing, software
10+
-- distributed under the License is distributed on an "AS IS" BASIS,
11+
-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
-- See the License for the specific language governing permissions and
13+
-- limitations under the License.
14+
15+
/* Run a query to see the prediction results of the model
16+
--
17+
select * from ML.PREDICT(MODEL ds_edw.model_taxi_estimate,
18+
TABLE ds_edw.taxi_trips)
19+
limit 1000; */
20+
21+
--Model Example
22+
CREATE OR REPLACE MODEL
23+
`${project_id}.ds_edw.model_taxi_estimate` OPTIONS ( MODEL_TYPE='LINEAR_REG',
24+
LS_INIT_LEARN_RATE=0.15,
25+
L1_REG=1,
26+
MAX_ITERATIONS=5 ) AS
27+
SELECT
28+
pickup_datetime,
29+
dropoff_datetime,
30+
IFNULL(passenger_count,0) passenger_count,
31+
IFNULL(trip_distance,0) trip_distance,
32+
IFNULL(rate_code,'') rate_code,
33+
IFNULL(payment_type,'') payment_type,
34+
IFNULL(fare_amount,0) label,
35+
IFNULL(pickup_location_id,'') pickup_location_id,
36+
IFNULL(dropoff_location_id,'')dropoff_location_id
37+
FROM
38+
`${project_id}.ds_edw.taxi_trips`
39+
WHERE
40+
fare_amount > 0;

0 commit comments

Comments
 (0)