Skip to content

Commit

Permalink
Merge pull request #380 from tinybirdco/readme-updates-377
Browse files Browse the repository at this point in the history
Readme updates for backfill #377
  • Loading branch information
alrocar authored Feb 26, 2025
2 parents 57a30a9 + df55318 commit 04e6c6b
Showing 1 changed file with 17 additions and 7 deletions.
24 changes: 17 additions & 7 deletions change_column_type_materialized_view/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ To change a column type in a Materialized View Data Source is a process that nee

This change needs to re-create the Materialized View and populate it again with all the data without stoping our ingestion.

For that the steps will be:
For that, the steps will be:

1. Create a new Materialized View (Pipe and Data Source) to change the type to the colum.
1. Create a new Materialized View (Pipe and Data Source) to change the type to the column.
2. Run CI.
3. Backfill the new Materialized View with the data previous to its creation.
4. Run CD and run the backfill in the main Workspace.
Expand Down Expand Up @@ -48,7 +48,7 @@ Create a Copy Pipe `analytics_pages_backfill.pipe` for backfilling purposes:
NODE analytics_pages_backfill_node
SQL >
%
SELECT
toDate(timestamp) AS date,
device,
Expand All @@ -67,14 +67,24 @@ SQL >
pathname
TYPE COPY
DATASOURCE analytics_pages_mv_1
TARGET_DATASOURCE analytics_pages_mv_1
```

## 2: Run CI

Make sure the changes are deployed correctly in the CI Tinybird Branch. Optionally you can add automated tests or verify it from the `tmp_ci_*` Branch created as part of the CI pipeline.

## 3: Backfilling
## 3: (For large datasets) Splitting the Data into Chunks for Backfilling

If your data source is large, you may run into a memory error like this:
```
error: "There was a problem while copying data: [Error] Memory limit (for query) exceeded. Make sure the query just process the required data. Contact us at support@tinybird.co for help or read this SQL tip: https://tinybird.co/docs/guides/best-practices-for-faster-sql.html#memory-limit-reached-title"
```

To avoid memory issues, you will need to break the backfill operation into smaller, manageable chunks. This approach reduces the memory load per query by processing only a subset of the data at a time. You can use the ***data source's sorting key*** to define each chunk.
Refer to [this guide](https://www.tinybird.co/docs/work-with-data/strategies/backfill-strategies#scenario-3-streaming-ingestion-with-incremental-timestamp-column) for more details.

## 4: Backfilling

Wait for the first event to be ingested into `analytics_pages_mv_1` and then proceed with the backfilling.

Expand All @@ -93,10 +103,10 @@ tb sql "select timestamp from tinybird.datasources_ops_log where event_type = 'c
tb pipe copy run analytics_pages_backfill --node analytics_pages_backfill_node --param start_backfill_timestamp='2024-01-01 00:00:00' --param end_backfill_timestamp='$CREATED_AT' --wait --yes
```

## 4: Run CD
## 5: Run CD

Merge the PR and make sure to run the backfilling operation over the main Workspace

## 5: Connect the downstream dependencies
## 6: Connect the downstream dependencies

Once the new Materialized View is created and synchronized you can create another Pull Request to start using it in your endpoints.

0 comments on commit 04e6c6b

Please sign in to comment.