Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover Quarantine using releases #152

Merged
merged 20 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This repository contains all the use cases you can iterate with Versions:
- [Change S3 Data Source sorting key](change_sorting_key_to_s3_data_source)
- [Change S3 Data Source sorting key with reingestion](change_sorting_key_to_s3_data_source_with_reingestion)
- [Change Kafka Data Source sorting key](change_sorting_key_to_kafka_data_source)
- [Recover data from quarantine](recover_data_from_quarantine) using a copy Pipe
- [Recover data from quarantine](v3/Quarentine_V3) using a copy Pipe
- [Add a new column to a Landing Data Source](add_column_landing_ds)
- [Add a new column to a Materialized View](add_column_materialized_view)
- [Add a new column to a Materialized View using Releases](v3/add_new_column_to_a_materialized_view_v3)
Expand Down
2 changes: 1 addition & 1 deletion v3/Quarentine_V3/.tinyenv
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
TB_VERSION_WARNING=0
VERSION=0.0.0
VERSION=0.0.1
17 changes: 15 additions & 2 deletions v3/Quarentine_V3/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
# Tinybird Versions - {{ YOUR USE CASE NAME HERE }}
# Recover data from quarantine V3

Work in progress ...
When data ends up in quarantine, it is possible to re-ingest it using a Copy Pipe. Create a [Pull Request](https://github.com/tinybirdco/use-case-examples/pull/152) following these steps:

> Remember to follow the [instructions](../README.md) to setup your Tinybird Data Project before jumping into the use-case steps

- Bump a new CI/CD version and generate deployment scripts `tb release generate --semver 0.0.1`
- In the CI file:
- Let's append incorrect data to `analytics_events` using a fixture (that's required to create the quarantine tables)
- Use `set +e` command when the incorrect data is being appended, if not the pipeline will finish with error.
- Create a copy Pipe to fix the incorrect data and re-ingest it into `analytics_events`
- In the CD file, it is only needed run the copy Pipe after creation`
- The temporary copy pipe will be created inside a Release (0.0.1), once the data is migrated is safe to remove the release:
```
tb release rm --semver 0.0.1 --force --yes
```

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions v3/Quarentine_V3/deploy/0.0.1/cd-deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/bin/bash
set -e

tb --semver 0.0.1 deploy --v3
tb --semver 0.0.1 pipe copy run analytics_events_quarantine_to_final --wait --yes
11 changes: 11 additions & 0 deletions v3/Quarentine_V3/deploy/0.0.1/ci-deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/bin/bash
set +e

# This line is only for demo purposes, It's adding wrong data to the analytics_events datasource
tb datasource append analytics_events datasources/fixtures/analytics_events_errors.ndjson

set -e

tb --semver 0.0.1 deploy --fixtures --v3
tb --semver 0.0.1 pipe copy run analytics_events_quarantine_to_final --wait --yes

14 changes: 14 additions & 0 deletions v3/Quarentine_V3/pipes/analytics_events_quarantine_to_final.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
NODE copy_quarantine
SQL >
SELECT
toDateTime(
fromUnixTimestamp64Milli(toUInt64(replaceAll(assumeNotNull(timestamp), '"', '')) * 1000)
) timestamp,
replaceAll(assumeNotNull(session_id), '"', '') session_id,
replaceAll(assumeNotNull(action), '"', '') action,
replaceAll(assumeNotNull(version), '"', '') version,
replaceAll(assumeNotNull(payload), '"', '') payload
FROM analytics_events_quarantine

TYPE COPY
TARGET_DATASOURCE analytics_events