Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix load job #2033

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -25,133 +25,103 @@ under the License.
-->



## Description

The data synchronization (Sync Job) function supports users to submit a resident data synchronization job, and incrementally synchronizes the CDC (Change Data Capture) of the user's data update operation in the Mysql database by reading the Binlog log from the specified remote address. Features.

Currently, the data synchronization job only supports connecting to Canal, obtaining the parsed Binlog data from the Canal Server and importing it into Doris.
The data synchronization (Sync Job) function allows users to submit a persistent data synchronization job. It incrementally synchronizes the CDC (Change Data Capture) of data update operations from a MySQL database by reading the Binlog from a specified remote source. Currently, the synchronization job supports connecting to Canal, obtaining parsed Binlog data from the Canal server, and importing it into Doris.

Users can view the data synchronization job status through [SHOW SYNC JOB](../../../../sql-manual/sql-statements/data-modification/load-and-export/SHOW-SYNC-JOB).
Users can view the status of synchronization jobs via [SHOW SYNC JOB](../../../../sql-manual/sql-statements/data-modification/load-and-export/SHOW-SYNC-JOB).

grammar:
## Syntax

```sql
CREATE SYNC [db.]job_name
(
channel_desc,
channel_desc
...
)
binlog_desc
CREATE SYNC [<db>.]<job_name>
(<channel_desc> [, ... ])
: FROM <mysql_db>.<src_tbl> INTO <des_tbl> [ <columns_mapping> ]
<binlog_desc>
: FROM BINLOG ("<key>" = "<value>" [, ... ])
```

1. `job_name`
## Required Parameters

The synchronization job name is the unique identifier of the job in the current database. Only one job with the same `job_name` can be running.
**1. `<job_name>`**

2. `channel_desc`
> Specifies the unique name of the synchronization job within the current database. Only one job with the same `<job_name>` can be running at a time.

The data channel under the job is used to describe the mapping relationship between the mysql source table and the doris target table.
**2. `<channel_desc>`**

grammar:
> Describes the mapping relationship between the MySQL source table and the Doris target table.
>
>
> - **`<mysql_db.src_tbl>`**: Specifies the source table in MySQL (including the database name).
> - **`<des_tbl>`**: Specifies the target table in Doris. The target table must be unique, and its batch delete function must be enabled.
> - **`<columns_mapping>`** (Optional): Defines the mapping between columns of the source and target tables. If omitted, columns are mapped one-to-one in order. Note that the form `col_name = expr` is not supported.

```sql
FROM mysql_db.src_tbl INTO des_tbl
[columns_mapping]
```

1. `mysql_db.src_tbl`

Specify the database and source table on the mysql side.

2. `des_tbl`

Specify the target table on the doris side. Only unique tables are supported, and the batch delete function of the table needs to be enabled (see the 'batch delete function' of help alter table for how to enable it).

4. `column_mapping`

Specifies the mapping relationship between the columns of the mysql source table and the doris target table. If not specified, FE will default the columns of the source table and the target table to one-to-one correspondence in order.

The form col_name = expr is not supported for columns.

Example:

```
Suppose the target table column is (k1, k2, v1),

Change the order of columns k1 and k2
(k2, k1, v1)

Ignore the fourth column of the source data
(k2, k1, v1, dummy_column)
```

3. `binlog_desc`

Used to describe the remote data source, currently only one canal is supported.

grammar:
**3. `<binlog_desc>`**

```sql
FROM BINLOG
(
"key1" = "value1",
"key2" = "value2"
)
```
> Describes the remote data source for the Binlog.
>
> The properties for the Canal data source (keys prefixed with `canal.`) include:
>
> - **`canal.server.ip`**: Address of the Canal server.
> - **`canal.server.port`**: Port of the Canal server.
> - **`canal.destination`**: Identifier of the Canal instance.
> - **`canal.batchSize`**: Maximum batch size to fetch (default is 8192).
> - **`canal.username`**: Username for the Canal instance.
> - **`canal.password`**: Password for the Canal instance.
> - **`canal.debug`** (Optional): If set to true, prints detailed batch and row information.

## Usage Notes

1. The properties corresponding to the Canal data source, prefixed with `canal.`
- Currently, the synchronization job only supports connecting to a Canal server.
- Only one synchronization job with the same `<job_name>` can run concurrently within a database.
- The target table specified in `<channel_desc>` must have its batch delete function enabled.

1. canal.server.ip: address of canal server
2. canal.server.port: the port of the canal server
3. canal.destination: the identity of the instance
4. canal.batchSize: The maximum batch size obtained, the default is 8192
5. canal.username: username of instance
6. canal.password: the password of the instance
7. canal.debug: optional, when set to true, the batch and details of each row of data will be printed out
## Access Control Requirements

## Example
Users executing this SQL command must have at least the following privileges:
| Privilege | Object | Notes |
| :---------------- | :------------- | :---------------------------- |
| LOAD_PRIV | Table | This operation can only be performed by users or roles who have the LOAD_PRIV privilege for the imported table. |

1. Simply create a data synchronization job named `job1` for `test_tbl` of `test_db`, connect to the local Canal server, corresponding to the Mysql source table `mysql_db1.tbl1`.
## Examples

```SQL
1. **Create a simple synchronization job**

Create a synchronization job named `job1` in the `test_db` database that maps the MySQL source table `mysql_db1.tbl1` to the Doris target table `test_tbl`, connecting to a local Canal server.

```sql
CREATE SYNC `test_db`.`job1`
(
FROM `mysql_db1`.`tbl1` INTO `test_tbl`
FROM `mysql_db1`.`tbl1` INTO `test_tbl`
)
FROM BINLOG
(
"type" = "canal",
"canal.server.ip" = "127.0.0.1",
"canal.server.port" = "11111",
"canal.destination" = "example",
"canal.username" = "",
"canal.password" = ""
"type" = "canal",
"canal.server.ip" = "127.0.0.1",
"canal.server.port" = "11111",
"canal.destination" = "example",
"canal.username" = "",
"canal.password" = ""
);
```

2. Create a data synchronization job named `job1` for multiple tables of `test_db`, corresponding to multiple Mysql source tables one-to-one, and explicitly specify the column mapping.
2. **Create a synchronization job with multiple channels and explicit column mapping**

```SQL
Create a synchronization job named `job1` in the `test_db` database for multiple MySQL source tables with one-to-one mapping and explicitly specified column orders.

```sql
CREATE SYNC `test_db`.`job1`
(
FROM `mysql_db`.`t1` INTO `test1` (k1, k2, v1) ,
FROM `mysql_db`.`t2` INTO `test2` (k3, k4, v2)
FROM `mysql_db`.`t1` INTO `test1` (k1, k2, v1),
FROM `mysql_db`.`t2` INTO `test2` (k3, k4, v2)
)
FROM BINLOG
(
"type" = "canal",
"canal.server.ip" = "xx.xxx.xxx.xx",
"canal.server.port" = "12111",
"canal.destination" = "example",
"canal.username" = "username",
"canal.password" = "password"
"type" = "canal",
"canal.server.ip" = "xx.xxx.xxx.xx",
"canal.server.port" = "12111",
"canal.destination" = "example",
"canal.username" = "username",
"canal.password" = "password"
);
```

## Keywords

CREATE, SYNC, JOB

## Best Practice
```
Original file line number Diff line number Diff line change
Expand Up @@ -24,29 +24,31 @@ specific language governing permissions and limitations
under the License.
-->



## Description

Pause a running resident data synchronization job in a database via `job_name`. The suspended job will stop synchronizing data and keep the latest position of consumption until it is resumed by the user.
Pause a running resident data synchronization job in a database identified by `job_name`. The suspended job will stop synchronizing data while retaining its latest consumption position until it is resumed by the user.

grammar:
## Syntax

```sql
PAUSE SYNC JOB [db.]job_name
PAUSE SYNC JOB [<db>.]<job_name>
```

## Example
## Required Parameters

1. Pause the data sync job named `job_name`.
**1. `<job_name>`**

```sql
PAUSE SYNC JOB `job_name`;
```
> Specifies the name of the synchronization job to be paused.
> If a database is specified using the `[<db.>]` prefix, the job is located in that database; otherwise, the current database is used.

## Keywords
## Access Control Requirements

PAUSE, SYNC, JOB
Any user or role can perform this operation.

## Example

## Best Practice
1. Pause the data synchronization job named `job_name`.

```sql
PAUSE SYNC JOB `job_name`;
```
Original file line number Diff line number Diff line change
Expand Up @@ -24,29 +24,33 @@ specific language governing permissions and limitations
under the License.
-->



## Description

Resume a resident data synchronization job whose current database has been suspended by `job_name`, and the job will continue to synchronize data from the latest position before the last suspension.
Resume a resident data synchronization job that has been suspended in a database by its `job_name`. Once resumed, the job continues to synchronize data starting from the latest position before the suspension.

grammar:
## Syntax

```sql
RESUME SYNC JOB [db.]job_name
RESUME SYNC JOB [<db>.]<job_name>
```

## Example
## Required Parameters

**1. `<job_name>`**

> Specifies the name of the data synchronization job to be resumed.
> If a database is specified with the `[<db>.]` prefix, the job is located in that database; otherwise, the current database is used.


1. Resume the data synchronization job named `job_name`
## Access Control Requirements

```sql
RESUME SYNC JOB `job_name`;
```
Any user or role can perform this operation.

## Keywords

RESUME, SYNC, LOAD
## Examples

## Best Practice
1. Resume the data synchronization job named `job_name`.

```sql
RESUME SYNC JOB `job_name`;
```
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@
"title": "SHOW SYNC JOB",
"language": "en"
}


---

<!--
Expand All @@ -26,34 +24,33 @@ specific language governing permissions and limitations
under the License.
-->


## Description

This command is used to currently display the status of resident data synchronization jobs in all databases.
This statement displays the status of resident data synchronization jobs in all databases.

grammar:
## Syntax

```sql
SHOW SYNC JOB [FROM db_name]
```

## Example
## Access Control Requirements
Users executing this SQL command must have at least one of the following privileges:

| Privilege | Object | Notes |
| :------------------------------------------------------------------------ | :------------- | :------------------------------------- |
| ADMIN_PRIV, SELECT_PRIV, LOAD_PRIV, ALTER_PRIV, CREATE_PRIV, DROP_PRIV, SHOW_VIEW_PRIV | Database `db_name` | This operation requires at least one of the listed privileges on the target database. |

## Examples

1. Display the status of all data synchronization jobs in the current database.

```sql
SHOW SYNC JOB;
```

2. Display the status of all data synchronization jobs under the database `test_db`.
2. Display the status of all data synchronization jobs in the `test_db` database.

```sql
SHOW SYNC JOB FROM `test_db`;
```

## Keywords

SHOW, SYNC, JOB

## Best Practice

```
Original file line number Diff line number Diff line change
Expand Up @@ -24,28 +24,33 @@ specific language governing permissions and limitations
under the License.
-->


## Description

Stop a non-stop resident data synchronization job in a database by `job_name`.
Stop a running resident data synchronization job in a database by specifying its `job_name`. Once stopped, the job will cease synchronizing data and release its occupied resources.

grammar:
## Syntax

```sql
STOP SYNC JOB [db.]job_name
STOP SYNC JOB [<db>.]<job_name>
```

## Example
## Required Parameters

**1. `<job_name>`**

1. Stop the data sync job named `job_name`
> Specifies the name of the data synchronization job to be stopped.
> If a database is specified with the `[db.]` prefix, the job is located in that database; otherwise, the current database is used.

```sql
STOP SYNC JOB `job_name`;
```

## Keywords
## Access Control Requirements

STOP, SYNC, JOB
Any user or role can perform this operation.


## Example

## Best Practice
1. Stop the data synchronization job named `job_name`.

```sql
STOP SYNC JOB `job_name`;
```
Loading