Merge pull request #380 from tinybirdco/readme-updates-377

Readme updates for backfill #377
tinybirdco · Feb 26, 2025 · 04e6c6b · 04e6c6b
2 parents 57a30a9 + df55318
commit 04e6c6b
Showing 1 changed file with 17 additions and 7 deletions.
diff --git a/change_column_type_materialized_view/README.md b/change_column_type_materialized_view/README.md
@@ -6,9 +6,9 @@ To change a column type in a Materialized View Data Source is a process that nee
 
 This change needs to re-create the Materialized View and populate it again with all the data without stoping our ingestion.
 
-For that the steps will be:
+For that, the steps will be:
 
-1. Create a new Materialized View (Pipe and Data Source) to change the type to the colum.
+1. Create a new Materialized View (Pipe and Data Source) to change the type to the column.
 2. Run CI.
 3. Backfill the new Materialized View with the data previous to its creation.
 4. Run CD and run the backfill in the main Workspace.
@@ -48,7 +48,7 @@ Create a Copy Pipe `analytics_pages_backfill.pipe` for backfilling purposes:
 NODE analytics_pages_backfill_node
 
 SQL >
-
+    %
     SELECT
         toDate(timestamp) AS date,
         device,
@@ -67,14 +67,24 @@ SQL >
         pathname
 
 TYPE COPY
-DATASOURCE analytics_pages_mv_1
+TARGET_DATASOURCE analytics_pages_mv_1
 ```
 
 ## 2: Run CI
 
 Make sure the changes are deployed correctly in the CI Tinybird Branch. Optionally you can add automated tests or verify it from the `tmp_ci_*` Branch created as part of the CI pipeline.
 
-## 3: Backfilling 
+## 3: (For large datasets) Splitting the Data into Chunks for Backfilling
+
+If your data source is large, you may run into a memory error like this:
+```
+error: "There was a problem while copying data: [Error] Memory limit (for query) exceeded. Make sure the query just process the required data. Contact us at support@tinybird.co for help or read this SQL tip: https://tinybird.co/docs/guides/best-practices-for-faster-sql.html#memory-limit-reached-title"
+```
+
+To avoid memory issues, you will need to break the backfill operation into smaller, manageable chunks. This approach reduces the memory load per query by processing only a subset of the data at a time. You can use the ***data source's sorting key*** to define each chunk.
+Refer to [this guide](https://www.tinybird.co/docs/work-with-data/strategies/backfill-strategies#scenario-3-streaming-ingestion-with-incremental-timestamp-column) for more details.
+
+## 4: Backfilling 
 
 Wait for the first event to be ingested into `analytics_pages_mv_1` and then proceed with the backfilling.
 
@@ -93,10 +103,10 @@ tb sql "select timestamp from tinybird.datasources_ops_log where event_type = 'c
 tb pipe copy run analytics_pages_backfill --node analytics_pages_backfill_node --param start_backfill_timestamp='2024-01-01 00:00:00' --param end_backfill_timestamp='$CREATED_AT' --wait --yes
 ```
 
-## 4: Run CD
+## 5: Run CD
 
 Merge the PR and make sure to run the backfilling operation over the main Workspace
 
-## 5: Connect the downstream dependencies
+## 6: Connect the downstream dependencies
 
 Once the new Materialized View is created and synchronized you can create another Pull Request to start using it in your endpoints.