-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for issue Runtime Errors in Action Execution result in an Unrecoverable Conversation #6852
base: main
Are you sure you want to change the base?
Conversation
From the log:
I think it's complaining that it has an action without observation, or with an observation without Could you enable litellm verbose logging, e.g. replace with something like this here:
Then it should print out in console exactly what it sends to the LLM API. |
else: | ||
break | ||
if message.tool_call_id: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, in a typical agent history, the action-observation 'pairs' linked by tool_call_id
are the vast majority. So I'm afraid that doing this means we have no more prompt caching, basically. 🤔
I think prompt caching is not really the problem we see here, but could be wrong. Maybe you can disable it completely from configuration to see if there is still an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was trying to figure this out, I actually added some logging in a local branch to see exactly what was being passed to LiteLLM here. The result was very verbose, but over multiple invocations I saw that the difference between a working sample and a failing was the cache directive. One such example is below:
{
'messages': [
{
'content': [
{
'type': 'text',
'text': 'You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.\n<IMPORTANT>\n* If user provides a path, you should NOT assume it\'s relative to the current working directory. Instead, you should explore the file system to find the file before working on it.\n* When configuring git credentials, use "openhands" as the user.name and "openhands@all-hands.dev" as the user.email by ******, unless explicitly instructed otherwise.\n* The assistant MUST NOT include comments in the code unless they are necessary to describe non-obvious behavior.\n</IMPORTANT>',
'cache_control': {
'type': 'ephemeral'
}
}
],
'role': 'system'
},
{
'content': [
{
'type': 'text',
'text': '<RUNTIME_INFORMATION>\nThe user has access to the following hosts for accessing a web application,\neach of which has a corresponding port:\n* https://work-1-kcnvachtmtlsuyzj.staging-runtime.all-hands.dev (port 12000)\n* https://work-2-kcnvachtmtlsuyzj.staging-runtime.all-hands.dev (port 12001)\n\nWhen starting a web server, use the corresponding ports. You should also\nset any options to allow iframes and CORS requests, and allow the server to\nbe accessed from any host (e.g. 0.0.0.0).\n</RUNTIME_INFORMATION>'
},
{
'type': 'text',
'text': 'I\'m going to be testing the environment in which you run bash scripts. Please execute a bash script that prints "Hello World"'
}
],
'role': 'user'
},
{
'content': [
],
'role': 'assistant',
'tool_calls': [
{
'id': 'toolu_01ScT97bN9iq3KsDpYNY5Ppt',
'type': 'function',
'function': {
'name': 'execute_bash',
'arguments': '{"command": "echo \\"Hello World\\""}'
}
}
]
},
{
'content': [
{
'type': 'text',
'text': 'Hello World\n[The command completed with exit code 0.]\n[Current working directory: /workspace]\n[Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]\n[Command finished with exit code 0]'
}
],
'role': 'tool',
'tool_call_id': 'toolu_01ScT97bN9iq3KsDpYNY5Ppt',
'name': 'execute_bash'
},
{
'content': [
{
'type': 'text',
'text': 'I\'ve executed a simple bash command that prints "Hello World" using the `echo` command. The command completed successfully with exit code 0, and we can see that:\n1. The current working directory is `/workspace`\n2. There\'s a Python interpreter available at `/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python`\n\nIs there anything specific about the bash environment you\'d like me to test?'
}
],
'role': 'assistant'
},
{
'content': [
{
'type': 'text',
'text': "Please execute the same script again. I've stopped the bash environment so it should throw an error this time",
'cache_control': {
'type': 'ephemeral'
}
}
],
'role': 'user'
},
{
'content': [
],
'role': 'assistant',
'tool_calls': [
{
'id': 'toolu_01EteZsAJEiMV1uJCWsaHcCQ',
'type': 'function',
'function': {
'name': 'execute_bash',
'arguments': '{"command": "echo \\"Hello World\\""}'
}
}
]
},
{
'content': [
{
'type': 'text',
'text': 'The action has not been executed.\n[Error occurred in processing last action]'
}
],
'role': 'tool',
'cache_control': {
'type': 'ephemeral'
},
'tool_call_id': 'toolu_01EteZsAJEiMV1uJCWsaHcCQ',
'name': 'execute_bash'
},
{
'content': [
{
'type': 'text',
'text': "Please execute the same script again. I've restarted the bash environment so it should no longer throw an error",
'cache_control': {
'type': 'ephemeral'
}
}
],
'role': 'user'
}
],
'tools': [
{
'type': 'function',
'function': {
'name': 'execute_bash',
'description': 'Execute a bash command in the terminal.\n* Long running commands: For commands that may run indefinitely, it should be run in the background and the output should be redirected to a file, e.g. command = `python3 app.py > server.log 2>&1 &`.\n* Interact with running process: If a bash command returns exit code `-1`, this means the process is not yet finished. By setting `is_input` to `true`, the assistant can interact with the running process and send empty `command` to retrieve any additional logs, or send additional text (set `command` to the text) to STDIN of the running process, or send command like `C-c` (Ctrl+C), `C-d` (Ctrl+D), `C-z` (Ctrl+Z) to interrupt the process.\n* One command at a time: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.\n',
'parameters': {
'type': 'object',
'properties': {
'command': {
'type': 'string',
'description': 'The bash command to execute. Can be empty string to view additional logs when previous exit code is `-1`. Can be `C-c` (Ctrl+C) to interrupt the currently running process. Note: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.'
},
'is_input': {
'type': 'string',
'description': 'If True, the command is an input to the running process. If False, the command is a bash command to be executed in the terminal. Default is False.',
'enum': [
'true',
'false'
]
}
},
'required': [
'command'
]
}
}
},
{
'type': 'function',
'function': {
'name': 'finish',
'description': 'Finish the interaction when the task is complete OR if the assistant cannot proceed further with the task.'
}
},
{
'type': 'function',
'function': {
'name': 'web_read',
'description': 'Read (convert to markdown) content from a webpage. You should prefer using the `web_read` tool over the `browser` tool, but do use the `browser` tool if you need to interact with a webpage (e.g., click a button, fill out a form, etc.).\n\nYou may use the `web_read` tool to read content from a webpage, and even search the webpage content using a Google search query (e.g., url=`https://www.google.com/search?q=YOUR_QUERY`).\n',
'parameters': {
'type': 'object',
'properties': {
'url': {
'type': 'string',
'description': 'The URL of the webpage to read. You can also use a Google search query here (e.g., `https://www.google.com/search?q=YOUR_QUERY`).'
}
},
'required': [
'url'
]
}
}
},
{
'type': 'function',
'function': {
'name': 'browser',
'description': 'Interact with the browser using Python code. Use it ONLY when you need to interact with a webpage.\n\nSee the description of "code" parameter for more details.\n\nMultiple actions can be provided at once, but will be executed sequentially without any feedback from the page.\nMore than 2-3 actions usually leads to failure or unexpected behavior. Example:\nfill(\'a12\', \'example with "quotes"\')\nclick(\'a51\')\nclick(\'48\', button=\'middle\', modifiers=[\'Shift\'])\n',
'parameters': {
'type': 'object',
'properties': {
'code': {
'type': 'string',
'description': 'The Python code that interacts with the browser.\n\nThe following 15 functions are available. Nothing else is supported.\n\ngoto(url: str)\n Description: Navigate to a url.\n Examples:\n goto(\'http://www.example.com\')\n\ngo_back()\n Description: Navigate to the previous page in history.\n Examples:\n go_back()\n\ngo_forward()\n Description: Navigate to the next page in history.\n Examples:\n go_forward()\n\nnoop(wait_ms: float = 1000)\n Description: Do nothing, and optionally wait for the given time (in milliseconds).\n You can use this to get the current page content and/or wait for the page to load.\n Examples:\n noop()\n\n noop(500)\n\nscroll(delta_x: float, delta_y: float)\n Description: Scroll horizontally and vertically. Amounts in pixels, positive for right or down scrolling, negative for left or up scrolling. Dispatches a wheel event.\n Examples:\n scroll(0, 200)\n\n scroll(-50.2, -100.5)\n\nfill(bid: str, value: str)\n Description: Fill out a form field. It focuses the element and triggers an input event with the entered text. It works for <input>, <textarea> and [contenteditable] elements.\n Examples:\n fill(\'237\', \'example value\')\n\n fill(\'45\', \'multi-line\nexample\')\n\n fill(\'a12\', \'example with "quotes"\')\n\nselect_option(bid: str, options: str | list[str])\n Description: Select one or multiple options in a <select> element. You can specify option value or label to select. Multiple options can be selected.\n Examples:\n select_option(\'a48\', \'blue\')\n\n select_option(\'c48\', [\'red\', \'green\', \'blue\'])\n\nclick(bid: str, button: Literal[\'left\', \'middle\', \'right\'] = \'left\', modifiers: list[typing.Literal[\'Alt\', \'Control\', \'ControlOrMeta\', \'Meta\', \'Shift\']] = [])\n Description: Click an element.\n Examples:\n click(\'a51\')\n\n click(\'b22\', button=\'right\')\n\n click(\'48\', button=\'middle\', modifiers=[\'Shift\'])\n\ndblclick(bid: str, button: Literal[\'left\', \'middle\', \'right\'] = \'left\', modifiers: list[typing.Literal[\'Alt\', \'Control\', \'ControlOrMeta\', \'Meta\', \'Shift\']] = [])\n Description: Double click an element.\n Examples:\n dblclick(\'12\')\n\n dblclick(\'ca42\', button=\'right\')\n\n dblclick(\'178\', button=\'middle\', modifiers=[\'Shift\'])\n\nhover(bid: str)\n Description: Hover over an element.\n Examples:\n hover(\'b8\')\n\npress(bid: str, key_comb: str)\n Description: Focus the matching element and press a combination of keys. It accepts the logical key names that are emitted in the keyboardEvent.key property of the keyboard events: Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp, F1 - F12, Digit0 - Digit9, KeyA - KeyZ, etc. You can alternatively specify a single character you\'d like to produce such as "a" or "#". Following modification shortcuts are also supported: Shift, Control, Alt, Meta, ShiftLeft, ControlOrMeta. ControlOrMeta resolves to Control on Windows and Linux and to Meta on macOS.\n Examples:\n press(\'88\', \'Backspace\')\n\n press(\'a26\', \'ControlOrMeta+a\')\n\n press(\'a61\', \'Meta+Shift+t\')\n\nfocus(bid: str)\n Description: Focus the matching element.\n Examples:\n focus(\'b455\')\n\nclear(bid: str)\n Description: Clear the input field.\n Examples:\n clear(\'996\')\n\ndrag_and_drop(from_bid: str, to_bid: str)\n Description: Perform a drag & drop. Hover the element that will be dragged. Press left mouse button. Move mouse to the element that will receive the drop. Release left mouse button.\n Examples:\n drag_and_drop(\'56\', \'498\')\n\nupload_file(bid: str, file: str | list[str])\n Description: Click an element and wait for a "filechooser" event, then select one or multiple input files for upload. Relative file paths are resolved relative to the current working directory. An empty list clears the selected files.\n Examples:\n upload_file(\'572\', \'/home/user/my_receipt.pdf\')\n\n upload_file(\'63\', [\'/home/bob/Documents/image.jpg\', \'/home/bob/Documents/file.zip\'])\n'
}
},
'required': [
'code'
]
}
}
},
{
'type': 'function',
'function': {
'name': 'execute_ipython_cell',
'description': 'Run a cell of Python code in an IPython environment.\n* The assistant should define variables and import packages before using them.\n* The variable defined in the IPython environment will not be available outside the IPython environment (e.g., in terminal).\n',
'parameters': {
'type': 'object',
'properties': {
'code': {
'type': 'string',
'description': 'The Python code to execute. Supports magic commands like %pip.'
}
},
'required': [
'code'
]
}
}
},
{
'type': 'function',
'function': {
'name': 'str_replace_editor',
'description': 'Custom editing tool for viewing, creating and editing files in plain-text format\n* State is persistent across command calls and discussions with the user\n* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep\n* The `create` command cannot be used if the specified `path` already exists as a file\n* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`\n* The `undo_edit` command will revert the last edit made to the file at `path`\n\nNotes for using the `str_replace` command:\n* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!\n* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique\n* The `new_str` parameter should contain the edited lines that should replace the `old_str`\n',
'parameters': {
'type': 'object',
'properties': {
'command': {
'description': 'The commands to run. Allowed options are: `view`, `create`, `str_replace`, `insert`, `undo_edit`.',
'enum': [
'view',
'create',
'str_replace',
'insert',
'undo_edit'
],
'type': 'string'
},
'path': {
'description': 'Absolute path to file or directory, e.g. `/workspace/file.py` or `/workspace`.',
'type': 'string'
},
'file_text': {
'description': 'Required parameter of `create` command, with the content of the file to be created.',
'type': 'string'
},
'old_str': {
'description': 'Required parameter of `str_replace` command containing the string in `path` to replace.',
'type': 'string'
},
'new_str': {
'description': 'Optional parameter of `str_replace` command containing the new string (if not given, no string will be added). Required parameter of `insert` command containing the string to insert.',
'type': 'string'
},
'insert_line': {
'description': 'Required parameter of `insert` command. The `new_str` will be inserted AFTER the line `insert_line` of `path`.',
'type': 'integer'
},
'view_range': {
'description': 'Optional parameter of `view` command when `path` points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g. [11, 12] will show lines 11 and 12. Indexing at 1 to start. Setting `[start_line, -1]` shows all lines from `start_line` to the end of the file.',
'items': {
'type': 'integer'
},
'type': 'array'
}
},
'required': [
'command',
'path'
]
}
}
}
]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting log! I was looking for this one:
{
'content': [
{
'type': 'text',
'text': 'The action has not been executed.\n[Error occurred in processing last action]'
}
],
'role': 'tool',
'cache_control': {
'type': 'ephemeral'
},
'tool_call_id': 'toolu_01EteZsAJEiMV1uJCWsaHcCQ',
'name': 'execute_bash'
},
I think this one is the ErrorObservation we make up here.
So:
- there was an exception during a runnable action
- the agent became STOPPED or ERROR
- we
reset
the controller:- stop tracking the action (
pending_action
) because it will never succeed, we know that now; - and invent an ErrorObs so that the agent history sent to the LLM still has its paired
tool_call_id
observation: a result of the tool call. This is there to address the error "could not find result tool block".
- stop tracking the action (
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about:
- please see [Bug]: Simplify prompt caching #6858 , it's overdue anyway, we can probably simplify prompt caching (maybe it even helps here?)
- we could disable it completely to see if we can replicate the error without it, to remove the clutter?
I have another example here where the cache directive is removed, but litellm still complained. I guess there is something else going on...
Back to the drawing board I guess...
I wonder if the litellm credentials are getting lost somehow... 🤔 |
A concrete example: When I use the litellm proxy as a base url and the API key, I get an error. When I use Anthropic directly (base_url=None), I don't get any error.
|
Oh that sounds familiar! Two things here please:
Either way, it sounds like the likelihood is that we might not have a bug, but litellm does. |
End-user friendly description of the problem this fixes or functionality that this introduces
Before this change, if the runtime went offline during a tool call, the system would enter an unrecoverable state where accessing the LLM would fail. (At least with Anthropic). The major change here is that we no longer consider tool call responses (Which may be errors) as cacheable.
Fixed an issue where the AI assistant could fail after recovering from a previous error, making the system more resilient and stable.
Give a summary of what the PR does, explaining any non-trivial design decisions
This PR modifies the prompt caching logic in
message_utils.py
to:Steps to Reproduce Error
I have only been able to reproduce this with the remote runtime.
I'm going to be testing the environment in which you run bash scripts. Please execute a bash script that prints "Hello World"
Please execute the same script again. I've stopped the bash environment so it should throw an error this time
. This will indeed yield an error telling you to refresh the page. NOTE: Currently refreshing the page does not clear this error (I have a separate PR in progress for that.). Right now you need to go back to the splash screen and wait for the Conversation to end (15 seconds by default).Please execute the same script again. I've restarted the bash environment so it should no longer throw an error
. You should see a message like this, and the conversation is unrecoverable:Behind the scenes, you get a stack trace like this:
Link of any specific issues this addresses
This addresses the error recovery stability issues reported in production environments.