Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Writable Warm Status API Model #14640

Open
e-emoto opened this issue Jul 3, 2024 · 6 comments
Open

[Proposal] Writable Warm Status API Model #14640

e-emoto opened this issue Jul 3, 2024 · 6 comments
Assignees
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Search:Remote Search

Comments

@e-emoto
Copy link
Contributor

e-emoto commented Jul 3, 2024

Is your feature request related to a problem? Please describe

The Status API in Writable Warm will be for listing the in-progress and failed index tierings. Since Writable Warm will eventually support other types of tierings for both dedicated and non-dedicated warm node clusters, the API needs to be extensible to cover those cases. The part of the design below describes the API model for the Status API, and the design for the rest of the API will be part of a follow-up task once some details about tiering metadata are figured out.

Describe the solution you'd like

API Model:

The API will use a source and target as input to filter which tierings are shown. It will validate that both inputs are valid tiers, and then use them to find any tierings that match the described type. The API should still work if only one of the source or target is given, and will find any tierings with that input, allowing for more flexible queries. In the default case if no source or target is given as input, the status API should return all in progress or failed tierings for the specified indices, regardless of the tiering change.

API Request:

GET /<indexNameOrPattern>/_tier?source=hot&target=warm
GET /<indexNameOrPattern>/_tier?status=ongoing
GET /<indexNameOrPattern>/_tier?verbose=true/false

The API would be a get request that has a few parameters. The index name in the path will be required, but can support using ‘_all’ or ‘*’ to get migrations from all indices that match the parameters. The API will also support comma separated index names.

API Parameters:

source = hot / warm (optional, no default)
target = hot / warm (optional, no default)

The values for the source and target parameters are the tiers, with source being the tier the index started in and target being the tier it is moving to.

status = failed / ongoing (optional, no default)

The values of the status parameter represent the state of the tiering. failed indicates that the tiering has failed and ongoing means the tiering process is in progress.

verbose = true / false (default false)

The verbose parameter determines whether the API response should include details like the shard relocation status, failure reason, and tiering start time.

API Response:

Success:

{
    "tiering_status" : [{
        "index" : "test1"
        "source": "hot",
        "target": "warm",
        "status" : "failed/ongoing",
    }]
}

With Verbose Flag:

{
    "tiering_status" : [{
        "index" : "test1"
        "source": "hot",
        "destination": "warm",
        "status" : "ongoing",
        "start_time" : "2024-06-27T00:00:00Z",
        "failure_time" : "2024-06-27T10:00:00Z",
        "duration" : "10:00:00",
        "shards" : {
                "total" : 10, 
                "successful" : 3, 
                "failed" : 2, 
                "ongoing" : 5,
            },
        "ongoing" : [{
                shard_id: 3, 
                node_id: "", 
                reason: ""
            }, 
            ... 
        ],
        "failed" : [{
                shard_id: 1, 
                node_id: "", 
                reason: ""
            }, 
            ... 
        ],
    }]
}

Failure:

{
    "error": {
        "root_cause": [
            {
                "type": "",
                "reason": "",
            }
        ],
    },
    "status": xxx
}

Example API Use Cases:

Get All Ongoing Tierings:

GET /_all/_tier?status=ongoing

{
    "tiering_status" : [{
        "index" : "test1"
        "source": "hot",
        "target": "warm",
        "status" : "ongoing",
    },{
        "index" : "test2"
        "source": "warm",
        "target": "hot",
        "status" : "ongoing",
    },
    
    ...
    
    ]
}

Get All Failed Hot To Warm Tierings:

GET /_all/_tier?source=hot&target=warm&status=failed

{
    "tiering_status" : [{
        "index" : "test3"
        "source": "hot",
        "target": "warm",
        "status" : "failed",
    },{
        "index" : "test4"
        "source": "hot",
        "target": "warm",
        "status" : "failed",
    },
    
    ...
    
    ]
}

Get Shard Details for a Specific Index Tiering:

GET /target_index/_tier?verbose=true

{
    "tiering_status" : [{
        "index" : "target_index"
        "source": "hot",
        "target": "warm",
        "status" : "ongoing",
        "start_time" : "2024-06-27T00:00:00Z",
        "failure_time" : "",
        "duration" : "",
        "shards" : {
                "total" : 10, 
                "successful" : 4, 
                "failed" : 0, 
                "ongoing" : 6,
            },
        "ongoing" : [{
                shard_id: 3, 
                node_id: "", 
                reason: ""
            }, 
            ... 
        ],
        "failed" : [],
    }]
}

Related component

Search:Remote Search

Describe alternatives you've considered

No response

Additional context

This issue is for getting feedback on the API structure, and will be followed up with a PR for the API spec and a low level design description.

Related issues:
#14679
#13294

@e-emoto e-emoto added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 3, 2024
@sohami sohami added the discuss Issues intended to help drive brainstorming and decision making label Jul 3, 2024
@yigithub yigithub moved this to Release v2.16 (7/23/24) in Tiered Writable Index Jul 3, 2024
@e-emoto
Copy link
Contributor Author

e-emoto commented Jul 3, 2024

@andrross @dblock @mch2 @ankitkala @rayshrey Any feedback on this would be greatly appreciated, thanks!

@e-emoto e-emoto changed the title [Proposal] Shasta Status API Model [Proposal] Writable Warm Status API Model Jul 3, 2024
@mch2 mch2 linked a pull request Jul 8, 2024 that will close this issue
9 tasks
@mch2 mch2 removed a link to a pull request Jul 8, 2024
9 tasks
@jed326
Copy link
Collaborator

jed326 commented Jul 9, 2024

Thanks @e-emoto! 2 quick questions from me:

  • Do we need a pagination mechanism or do we typically expect the number of tiering_status items to be relatively low?
  • Do we need a corresponding tabular API? Like GET _cat/snaphots vs. GET _snapshot?

Looking forward to any PRs!

@jed326 jed326 removed the untriaged label Jul 9, 2024
@e-emoto
Copy link
Contributor Author

e-emoto commented Jul 9, 2024

Thanks @e-emoto! 2 quick questions from me:

  • Do we need a pagination mechanism or do we typically expect the number of tiering_status items to be relatively low?
  • Do we need a corresponding tabular API? Like GET _cat/snaphots vs. GET _snapshot?

Looking forward to any PRs!

Hi @jed326, thanks for your response.
Regarding your first question, we don't think we need a pagination mechanism because the number of items returned is not unbounded and should usually not be very high. As for the second question, we are considering having a flag that makes the response tabular.

@dblock
Copy link
Member

dblock commented Jul 9, 2024

Overall the API as {index}/_tier looks consistent with other APIs to me.

  • Should it be _tiers? Should the response contain tier(s) and not tiering_status?
  • Notice that it's _cat/snapshots and _snapshot, do we need something similar like tier vs. tiers?
  • Is it possible that in the future we'd want to tier something other than an index? In which case would we prefer storagetier or storage_tier? Or is storage_tier clearer regardless?

@sohami
Copy link
Collaborator

sohami commented Jul 11, 2024

Notice that it's _cat/snapshots and _snapshot, do we need something similar like tier vs. tiers?

@e-emoto @dblock I think we can align this status API similar to recovery API which shows different index shard recovery status. There are 2 variants: a) /_cat /recovery, /_cat/recovery/{index} and b) GET /_recovery, GET /{index}/_recovery. So for index status for tiering we can have:

  1. /_cat/tier or even /_cat/tiering. I think we can have it without s since it is showing status about the action and not the resource like all possible snapshots or indices.
  2. GET /_tiering, GET /{index}/_tiering

Should the response contain tier(s) and not tiering_status?

I think we can remove the tiering_status from response and have it in below format (sort of a map) where index name is key and status is in the value.

{
    "test1" : {
        "source": "hot",
        "target": "warm",
        "status" : "ongoing",
    },
    "test2": {
        "source": "warm",
        "target": "hot",
        "status" : "ongoing",
    },
}

Is it possible that in the future we'd want to tier something other than an index? In which case would we prefer storagetier or storage_tier? Or is storage_tier clearer regardless?

That is a good question. Index is a logical entity and we are tiering the data but in the unit of index, hence we are keeping it generic like tier. There can be compute nodes as well which are configured only for hot or warm tiers. So I think tier will fit well from that perspective too. Whether tier is referenced in context of index or node that will be determined by index/node level setting/attribute.

@neetikasinghal
Copy link
Contributor

thanks @e-emoto.
Instead of overriding the status api to display one variant(with verbose flag) in json and the other in tabular format(without the verbose), we can have other APIs as suggested by @sohami. It would be good to add in the details around the _cat and get tiering APIs (can be in a separate issue).
We could also think on making the verbose as true for the status API if the high-level status can be given by the _cat/get tiering APIs.

Couple of other suggestions for the status API-

  • Can we support query param - local that the end-user can provide to run the API on the data node vs cluster manager node?
  • Can we also support a query param on duration which could help filter out the the tiering operations that are long-running.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request Search:Remote Search
Projects
Status: Now(This Quarter)
Status: Release v2.16 (7/23/24)
Development

No branches or pull requests

5 participants