Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue Runtime Errors in Action Execution result in an Unrecoverable Conversation #6852

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tofarr
Copy link
Collaborator

@tofarr tofarr commented Feb 20, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

Before this change, if the runtime went offline during a tool call, the system would enter an unrecoverable state where accessing the LLM would fail. (At least with Anthropic). The major change here is that we no longer consider tool call responses (Which may be errors) as cacheable.

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Fixed an issue where the AI assistant could fail after recovering from a previous error, making the system more resilient and stable.


Give a summary of what the PR does, explaining any non-trivial design decisions

This PR modifies the prompt caching logic in message_utils.py to:

  1. Skip tool call messages when applying prompt caching breakpoints
  2. Continue processing messages even after finding a breakpoint, ensuring we don't miss important context
  3. Use an early return when we've used all breakpoints for better code clarity

Steps to Reproduce Error

I have only been able to reproduce this with the remote runtime.

  1. Create a new conversation with the prompt: I'm going to be testing the environment in which you run bash scripts. Please execute a bash script that prints "Hello World"
  2. Once the prompt completes, stop the runtime using the runtime API. e.g.:
curl -X POST https://runtime.staging.all-hands.dev/stop \
-H "X-API-Key: ******" \
-H "Content-Type: application/json" \
--data-binary @- << EOF
{
  "runtime_id": "olynpucajoqyxrzk"
}
EOF
  1. Prompt the agent again with Please execute the same script again. I've stopped the bash environment so it should throw an error this time. This will indeed yield an error telling you to refresh the page. NOTE: Currently refreshing the page does not clear this error (I have a separate PR in progress for that.). Right now you need to go back to the splash screen and wait for the Conversation to end (15 seconds by default).
  2. Re-enter the conversation and prompt it with: Please execute the same script again. I've restarted the bash environment so it should no longer throw an error. You should see a message like this, and the conversation is unrecoverable:
    image

Behind the scenes, you get a stack trace like this:

09:56:31 - openhands:ERROR: agent_controller.py:240 - [Agent Controller aca8cbb8acb94687ae53d4302bf502ef] Error while running the agent (session ID: aca8cbb8acb94687ae53d4302bf502ef): litellm.InternalServerError: AnthropicException - {"error":{"message":"litellm.APIConnectionError: BedrockException - {"message":"You do not have access to explicit prompt caching"}\nReceived Model Group=claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.26: Did not find 1 tool_result block(s) at the beginning of this message. Messages following tool_use blocks must begin with a matching number of tool_result blocks."}}\nReceived Model Group=anthropic/claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.26: Did not find 1 tool_result block(s) at the beginning of this message. Messages following tool_use blocks must begin with a matching number of tool_result blocks."}} LiteLLM Retried: 2 times, LiteLLM Max Retries: 3 LiteLLM Retried: 2 times, LiteLLM Max Retries: 3","type":null,"param":null,"code":"500"}}. Handle with litellm.InternalServerError.. Traceback: Traceback (most recent call last):

File "/app/.venv/lib/python3.12/site-packages/litellm/llms/anthropic/chat/handler.py", line 412, in completion

response = client.post(

^^^^^^^^^^^^

File "/app/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 557, in post

raise e

File "/app/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 538, in post

response.raise_for_status()

File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status

raise HTTPStatusError(message, request=request, response=self)

httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'https://llm-proxy.staging.all-hands.dev/v1/messages'

For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 1878, in completion

response = anthropic_chat_completions.completion(

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/app/.venv/lib/python3.12/site-packages/litellm/llms/anthropic/chat/handler.py", line 427, in completion

raise AnthropicError(

litellm.llms.anthropic.common_utils.AnthropicError: {"error":{"message":"litellm.APIConnectionError: BedrockException - {"message":"You do not have access to explicit prompt caching"}\nReceived Model Group=claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.26: Did not find 1 tool_result block(s) at the beginning of this message. Messages following tool_use blocks must begin with a matching number of tool_result blocks."}}\nReceived Model Group=anthropic/claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"messages.26: Did not find 1 tool_result block(s) at the beginning of this message. Messages following tool_use blocks must begin with a matching number of tool_result blocks."}} LiteLLM Retried: 2 times, LiteLLM Max Retries: 3 LiteLLM Retried: 2 times, LiteLLM Max Retries: 3","type":null,"param":null,"code":"500"}}

Link of any specific issues this addresses

This addresses the error recovery stability issues reported in production environments.

@tofarr tofarr changed the title Fix for issue where LLM errors out after recovery Fix for issue Runtime Errors in Action Execution result in an Unrecoverable Conversation Feb 20, 2025
@tofarr tofarr marked this pull request as ready for review February 20, 2025 13:25
@enyst
Copy link
Collaborator

enyst commented Feb 20, 2025

From the log:

messages.26:
Did not find 1 tool_result block(s) at the beginning of this message.
Messages following tool_use blocks must begin with a matching number of tool_result blocks.

I think it's complaining that it has an action without observation, or with an observation without tool_use_id. It probably has an "unpaired" ErrorObs, or nothing. 🤔

Could you enable litellm verbose logging, e.g. replace with something like this here:

# Exclude LiteLLM from logging output
# logging.getLogger('LiteLLM').disabled = True
logging.getLogger('LiteLLM Router').disabled = True
logging.getLogger('LiteLLM Proxy').disabled = True
litellm.set_verbose = True

Then it should print out in console exactly what it sends to the LLM API.

else:
break
if message.tool_call_id:
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, in a typical agent history, the action-observation 'pairs' linked by tool_call_id are the vast majority. So I'm afraid that doing this means we have no more prompt caching, basically. 🤔

I think prompt caching is not really the problem we see here, but could be wrong. Maybe you can disable it completely from configuration to see if there is still an error.

Copy link
Collaborator Author

@tofarr tofarr Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was trying to figure this out, I actually added some logging in a local branch to see exactly what was being passed to LiteLLM here. The result was very verbose, but over multiple invocations I saw that the difference between a working sample and a failing was the cache directive. One such example is below:

{
  'messages': [
    {
      'content': [
        {
          'type': 'text',
          'text': 'You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.\n<IMPORTANT>\n* If user provides a path, you should NOT assume it\'s relative to the current working directory. Instead, you should explore the file system to find the file before working on it.\n* When configuring git credentials, use "openhands" as the user.name and "openhands@all-hands.dev" as the user.email by ******, unless explicitly instructed otherwise.\n* The assistant MUST NOT include comments in the code unless they are necessary to describe non-obvious behavior.\n</IMPORTANT>',
          'cache_control': {
            'type': 'ephemeral'
          }
        }
      ],
      'role': 'system'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': '<RUNTIME_INFORMATION>\nThe user has access to the following hosts for accessing a web application,\neach of which has a corresponding port:\n* https://work-1-kcnvachtmtlsuyzj.staging-runtime.all-hands.dev (port 12000)\n* https://work-2-kcnvachtmtlsuyzj.staging-runtime.all-hands.dev (port 12001)\n\nWhen starting a web server, use the corresponding ports. You should also\nset any options to allow iframes and CORS requests, and allow the server to\nbe accessed from any host (e.g. 0.0.0.0).\n</RUNTIME_INFORMATION>'
        },
        {
          'type': 'text',
          'text': 'I\'m going to be testing the environment in which you run bash scripts.  Please execute a bash script that prints "Hello World"'
        }
      ],
      'role': 'user'
    },
    {
      'content': [
        
      ],
      'role': 'assistant',
      'tool_calls': [
        {
          'id': 'toolu_01ScT97bN9iq3KsDpYNY5Ppt',
          'type': 'function',
          'function': {
            'name': 'execute_bash',
            'arguments': '{"command": "echo \\"Hello World\\""}'
          }
        }
      ]
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'Hello World\n[The command completed with exit code 0.]\n[Current working directory: /workspace]\n[Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]\n[Command finished with exit code 0]'
        }
      ],
      'role': 'tool',
      'tool_call_id': 'toolu_01ScT97bN9iq3KsDpYNY5Ppt',
      'name': 'execute_bash'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'I\'ve executed a simple bash command that prints "Hello World" using the `echo` command. The command completed successfully with exit code 0, and we can see that:\n1. The current working directory is `/workspace`\n2. There\'s a Python interpreter available at `/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python`\n\nIs there anything specific about the bash environment you\'d like me to test?'
        }
      ],
      'role': 'assistant'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': "Please execute the same script again. I've stopped the bash environment so it should throw an error this time",
          'cache_control': {
            'type': 'ephemeral'
          }
        }
      ],
      'role': 'user'
    },
    {
      'content': [
        
      ],
      'role': 'assistant',
      'tool_calls': [
        {
          'id': 'toolu_01EteZsAJEiMV1uJCWsaHcCQ',
          'type': 'function',
          'function': {
            'name': 'execute_bash',
            'arguments': '{"command": "echo \\"Hello World\\""}'
          }
        }
      ]
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'The action has not been executed.\n[Error occurred in processing last action]'
        }
      ],
      'role': 'tool',
      'cache_control': {
        'type': 'ephemeral'
      },
      'tool_call_id': 'toolu_01EteZsAJEiMV1uJCWsaHcCQ',
      'name': 'execute_bash'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': "Please execute the same script again. I've restarted the bash environment so it should no longer throw an error",
          'cache_control': {
            'type': 'ephemeral'
          }
        }
      ],
      'role': 'user'
    }
  ],
  'tools': [
    {
      'type': 'function',
      'function': {
        'name': 'execute_bash',
        'description': 'Execute a bash command in the terminal.\n* Long running commands: For commands that may run indefinitely, it should be run in the background and the output should be redirected to a file, e.g. command = `python3 app.py > server.log 2>&1 &`.\n* Interact with running process: If a bash command returns exit code `-1`, this means the process is not yet finished. By setting `is_input` to `true`, the assistant can interact with the running process and send empty `command` to retrieve any additional logs, or send additional text (set `command` to the text) to STDIN of the running process, or send command like `C-c` (Ctrl+C), `C-d` (Ctrl+D), `C-z` (Ctrl+Z) to interrupt the process.\n* One command at a time: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'command': {
              'type': 'string',
              'description': 'The bash command to execute. Can be empty string to view additional logs when previous exit code is `-1`. Can be `C-c` (Ctrl+C) to interrupt the currently running process. Note: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.'
            },
            'is_input': {
              'type': 'string',
              'description': 'If True, the command is an input to the running process. If False, the command is a bash command to be executed in the terminal. Default is False.',
              'enum': [
                'true',
                'false'
              ]
            }
          },
          'required': [
            'command'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'finish',
        'description': 'Finish the interaction when the task is complete OR if the assistant cannot proceed further with the task.'
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'web_read',
        'description': 'Read (convert to markdown) content from a webpage. You should prefer using the `web_read` tool over the `browser` tool, but do use the `browser` tool if you need to interact with a webpage (e.g., click a button, fill out a form, etc.).\n\nYou may use the `web_read` tool to read content from a webpage, and even search the webpage content using a Google search query (e.g., url=`https://www.google.com/search?q=YOUR_QUERY`).\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'url': {
              'type': 'string',
              'description': 'The URL of the webpage to read. You can also use a Google search query here (e.g., `https://www.google.com/search?q=YOUR_QUERY`).'
            }
          },
          'required': [
            'url'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'browser',
        'description': 'Interact with the browser using Python code. Use it ONLY when you need to interact with a webpage.\n\nSee the description of "code" parameter for more details.\n\nMultiple actions can be provided at once, but will be executed sequentially without any feedback from the page.\nMore than 2-3 actions usually leads to failure or unexpected behavior. Example:\nfill(\'a12\', \'example with "quotes"\')\nclick(\'a51\')\nclick(\'48\', button=\'middle\', modifiers=[\'Shift\'])\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'code': {
              'type': 'string',
              'description': 'The Python code that interacts with the browser.\n\nThe following 15 functions are available. Nothing else is supported.\n\ngoto(url: str)\n    Description: Navigate to a url.\n    Examples:\n        goto(\'http://www.example.com\')\n\ngo_back()\n    Description: Navigate to the previous page in history.\n    Examples:\n        go_back()\n\ngo_forward()\n    Description: Navigate to the next page in history.\n    Examples:\n        go_forward()\n\nnoop(wait_ms: float = 1000)\n    Description: Do nothing, and optionally wait for the given time (in milliseconds).\n    You can use this to get the current page content and/or wait for the page to load.\n    Examples:\n        noop()\n\n        noop(500)\n\nscroll(delta_x: float, delta_y: float)\n    Description: Scroll horizontally and vertically. Amounts in pixels, positive for right or down scrolling, negative for left or up scrolling. Dispatches a wheel event.\n    Examples:\n        scroll(0, 200)\n\n        scroll(-50.2, -100.5)\n\nfill(bid: str, value: str)\n    Description: Fill out a form field. It focuses the element and triggers an input event with the entered text. It works for <input>, <textarea> and [contenteditable] elements.\n    Examples:\n        fill(\'237\', \'example value\')\n\n        fill(\'45\', \'multi-line\nexample\')\n\n        fill(\'a12\', \'example with "quotes"\')\n\nselect_option(bid: str, options: str | list[str])\n    Description: Select one or multiple options in a <select> element. You can specify option value or label to select. Multiple options can be selected.\n    Examples:\n        select_option(\'a48\', \'blue\')\n\n        select_option(\'c48\', [\'red\', \'green\', \'blue\'])\n\nclick(bid: str, button: Literal[\'left\', \'middle\', \'right\'] = \'left\', modifiers: list[typing.Literal[\'Alt\', \'Control\', \'ControlOrMeta\', \'Meta\', \'Shift\']] = [])\n    Description: Click an element.\n    Examples:\n        click(\'a51\')\n\n        click(\'b22\', button=\'right\')\n\n        click(\'48\', button=\'middle\', modifiers=[\'Shift\'])\n\ndblclick(bid: str, button: Literal[\'left\', \'middle\', \'right\'] = \'left\', modifiers: list[typing.Literal[\'Alt\', \'Control\', \'ControlOrMeta\', \'Meta\', \'Shift\']] = [])\n    Description: Double click an element.\n    Examples:\n        dblclick(\'12\')\n\n        dblclick(\'ca42\', button=\'right\')\n\n        dblclick(\'178\', button=\'middle\', modifiers=[\'Shift\'])\n\nhover(bid: str)\n    Description: Hover over an element.\n    Examples:\n        hover(\'b8\')\n\npress(bid: str, key_comb: str)\n    Description: Focus the matching element and press a combination of keys. It accepts the logical key names that are emitted in the keyboardEvent.key property of the keyboard events: Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp, F1 - F12, Digit0 - Digit9, KeyA - KeyZ, etc. You can alternatively specify a single character you\'d like to produce such as "a" or "#". Following modification shortcuts are also supported: Shift, Control, Alt, Meta, ShiftLeft, ControlOrMeta. ControlOrMeta resolves to Control on Windows and Linux and to Meta on macOS.\n    Examples:\n        press(\'88\', \'Backspace\')\n\n        press(\'a26\', \'ControlOrMeta+a\')\n\n        press(\'a61\', \'Meta+Shift+t\')\n\nfocus(bid: str)\n    Description: Focus the matching element.\n    Examples:\n        focus(\'b455\')\n\nclear(bid: str)\n    Description: Clear the input field.\n    Examples:\n        clear(\'996\')\n\ndrag_and_drop(from_bid: str, to_bid: str)\n    Description: Perform a drag & drop. Hover the element that will be dragged. Press left mouse button. Move mouse to the element that will receive the drop. Release left mouse button.\n    Examples:\n        drag_and_drop(\'56\', \'498\')\n\nupload_file(bid: str, file: str | list[str])\n    Description: Click an element and wait for a "filechooser" event, then select one or multiple input files for upload. Relative file paths are resolved relative to the current working directory. An empty list clears the selected files.\n    Examples:\n        upload_file(\'572\', \'/home/user/my_receipt.pdf\')\n\n        upload_file(\'63\', [\'/home/bob/Documents/image.jpg\', \'/home/bob/Documents/file.zip\'])\n'
            }
          },
          'required': [
            'code'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'execute_ipython_cell',
        'description': 'Run a cell of Python code in an IPython environment.\n* The assistant should define variables and import packages before using them.\n* The variable defined in the IPython environment will not be available outside the IPython environment (e.g., in terminal).\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'code': {
              'type': 'string',
              'description': 'The Python code to execute. Supports magic commands like %pip.'
            }
          },
          'required': [
            'code'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'str_replace_editor',
        'description': 'Custom editing tool for viewing, creating and editing files in plain-text format\n* State is persistent across command calls and discussions with the user\n* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep\n* The `create` command cannot be used if the specified `path` already exists as a file\n* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`\n* The `undo_edit` command will revert the last edit made to the file at `path`\n\nNotes for using the `str_replace` command:\n* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!\n* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique\n* The `new_str` parameter should contain the edited lines that should replace the `old_str`\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'command': {
              'description': 'The commands to run. Allowed options are: `view`, `create`, `str_replace`, `insert`, `undo_edit`.',
              'enum': [
                'view',
                'create',
                'str_replace',
                'insert',
                'undo_edit'
              ],
              'type': 'string'
            },
            'path': {
              'description': 'Absolute path to file or directory, e.g. `/workspace/file.py` or `/workspace`.',
              'type': 'string'
            },
            'file_text': {
              'description': 'Required parameter of `create` command, with the content of the file to be created.',
              'type': 'string'
            },
            'old_str': {
              'description': 'Required parameter of `str_replace` command containing the string in `path` to replace.',
              'type': 'string'
            },
            'new_str': {
              'description': 'Optional parameter of `str_replace` command containing the new string (if not given, no string will be added). Required parameter of `insert` command containing the string to insert.',
              'type': 'string'
            },
            'insert_line': {
              'description': 'Required parameter of `insert` command. The `new_str` will be inserted AFTER the line `insert_line` of `path`.',
              'type': 'integer'
            },
            'view_range': {
              'description': 'Optional parameter of `view` command when `path` points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g. [11, 12] will show lines 11 and 12. Indexing at 1 to start. Setting `[start_line, -1]` shows all lines from `start_line` to the end of the file.',
              'items': {
                'type': 'integer'
              },
              'type': 'array'
            }
          },
          'required': [
            'command',
            'path'
          ]
        }
      }
    }
  ]
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an interesting log! I was looking for this one:

{
      'content': [
        {
          'type': 'text',
          'text': 'The action has not been executed.\n[Error occurred in processing last action]'
        }
      ],
      'role': 'tool',
      'cache_control': {
        'type': 'ephemeral'
      },
      'tool_call_id': 'toolu_01EteZsAJEiMV1uJCWsaHcCQ',
      'name': 'execute_bash'
    },

I think this one is the ErrorObservation we make up here.

So:

  • there was an exception during a runnable action
  • the agent became STOPPED or ERROR
  • we reset the controller:
    • stop tracking the action (pending_action) because it will never succeed, we know that now;
    • and invent an ErrorObs so that the agent history sent to the LLM still has its paired tool_call_id observation: a result of the tool call. This is there to address the error "could not find result tool block".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about:

  • please see [Bug]: Simplify prompt caching #6858 , it's overdue anyway, we can probably simplify prompt caching (maybe it even helps here?)
  • we could disable it completely to see if we can replicate the error without it, to remove the clutter?

@tofarr tofarr marked this pull request as draft February 20, 2025 15:59
@tofarr
Copy link
Collaborator Author

tofarr commented Feb 20, 2025

I have another example here where the cache directive is removed, but litellm still complained. I guess there is something else going on...

'messages': [
    {
      'content': [
        {
          'type': 'text',
          'text': 'You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.\n<IMPORTANT>\n* If user provides a path, you should NOT assume it\'s relative to the current working directory. Instead, you should explore the file system to find the file before working on it.\n* When configuring git credentials, use "openhands" as the user.name and "openhands@all-hands.dev" as the user.email by ******, unless explicitly instructed otherwise.\n* The assistant MUST NOT include comments in the code unless they are necessary to describe non-obvious behavior.\n</IMPORTANT>',
          'cache_control': {
            'type': 'ephemeral'
          }
        }
      ],
      'role': 'system'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': '<RUNTIME_INFORMATION>\nThe user has access to the following hosts for accessing a web application,\neach of which has a corresponding port:\n* https://work-1-iggsbuczikcteokp.staging-runtime.all-hands.dev (port 12000)\n* https://work-2-iggsbuczikcteokp.staging-runtime.all-hands.dev (port 12001)\n\nWhen starting a web server, use the corresponding ports. You should also\nset any options to allow iframes and CORS requests, and allow the server to\nbe accessed from any host (e.g. 0.0.0.0).\n</RUNTIME_INFORMATION>'
        },
        {
          'type': 'text',
          'text': 'I\'m going to be testing the environment in which you run bash scripts.  Please execute a bash script that prints "Hello World"',
          'cache_control': {
            'type': 'ephemeral'
          }
        }
      ],
      'role': 'user'
    },
    {
      'content': [
        
      ],
      'role': 'assistant',
      'tool_calls': [
        {
          'id': 'toolu_01YD9jp2Pjwadjqzc8BPSz27',
          'type': 'function',
          'function': {
            'name': 'execute_bash',
            'arguments': '{"command": "echo \\"Hello World\\""}'
          }
        }
      ]
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'Hello World\n[The command completed with exit code 0.]\n[Current working directory: /workspace]\n[Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]\n[Command finished with exit code 0]'
        }
      ],
      'role': 'tool',
      'tool_call_id': 'toolu_01YD9jp2Pjwadjqzc8BPSz27',
      'name': 'execute_bash'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'I\'ve executed a simple bash command that prints "Hello World" using the `echo` command. The command completed successfully with exit code 0, and we can see that:\n1. The current working directory is `/workspace`\n2. There\'s a Python interpreter available at `/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python`\n\nIs there anything specific about the bash environment you\'d like me to test?'
        }
      ],
      'role': 'assistant'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': "Please execute the same script again. I've stopped the bash environment so it should throw an error this time",
          'cache_control': {
            'type': 'ephemeral'
          }
        }
      ],
      'role': 'user'
    },
    {
      'content': [
        
      ],
      'role': 'assistant',
      'tool_calls': [
        {
          'id': 'toolu_01WRLkgkEYdASo6rJJfQKTAh',
          'type': 'function',
          'function': {
            'name': 'execute_bash',
            'arguments': '{"command": "echo \\"Hello World\\""}'
          }
        }
      ]
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'The action has not been executed.\n[Error occurred in processing last action]'
        }
      ],
      'role': 'tool',
      'tool_call_id': 'toolu_01WRLkgkEYdASo6rJJfQKTAh',
      'name': 'execute_bash'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': "Please execute the same script again. I've restarted the bash environment so it should no longer throw an error",
          'cache_control': {
            'type': 'ephemeral'
          }
        }
      ],
      'role': 'user'
    }
  ],

image

Back to the drawing board I guess...

error":{"message":"litellm.APIConnectionError: BedrockException - {\"message\":\"You do not have access to explicit prompt caching\"}\nReceived Model Group=claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages.6: Did not find 1 `tool_result` block(s) at the beginning of this message. Messages following `tool_use` blocks must begin with a matching number of `tool_result` blocks.\"}}\nReceived Model Group=anthropic/claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages.6: Did not find 1 `tool_result` block(s) at the beginning of this message. Messages following `tool_use` blocks must begin with a matching number of `tool_result` blocks.\"}} LiteLLM Retried: 2 times, LiteLLM Max Retries: 3 LiteLLM Retried: 2 times, LiteLLM Max Retries: 3","type":null,"param":null,"code":"500"}}. Handle with `litellm.InternalServerError`.. Traceback: Traceback (most recent call last):
  File "/app/.venv/lib/python3.12/site-packages/litellm/llms/anthropic/chat/handler.py", line 412, in completion
    response = client.post(
               ^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 557, in post
    raise e
  File "/app/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/http_handler.py", line 538, in post
    response.raise_for_status()
  File "/app/.venv/lib/python3.12/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'https://llm-proxy.staging.all-hands.dev/v1/messages'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/.venv/lib/python3.12/site-packages/litellm/main.py", line 1878, in completion
    response = anthropic_chat_completions.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/.venv/lib/python3.12/site-packages/litellm/llms/anthropic/chat/handler.py", line 427, in completion
    raise AnthropicError(
litellm.llms.anthropic.common_utils.AnthropicError: {"error":{"message":"litellm.APIConnectionError: BedrockException - {\"message\":\"You do not have access to explicit prompt caching\"}\nReceived Model Group=claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages.6: Did not find 1 `tool_result` block(s) at the beginning of this message. Messages following `tool_use` blocks must begin with a matching number of `tool_result` blocks.\"}}\nReceived Model Group=anthropic/claude-3-5-sonnet-20241022\nAvailable Model Group Fallbacks=['anthropic/claude-3-5-sonnet-20241022']\nError doing the fallback: litellm.BadRequestError: AnthropicException - {\"type\":\"error\",\"error\":{\"type\":\"invalid_request_error\",\"message\":\"messages.6: Did not find 1 `tool_result` block(s) at the beginning of this message. Messages following `tool_use` blocks must begin with a matching number of `tool_result` blocks.\"}} LiteLLM Retried: 2 times, LiteLLM Max Retries: 3 LiteLLM Retried: 2 times, LiteLLM Max Retries: 3","type":null,"param":null,"code":"500"}}

I wonder if the litellm credentials are getting lost somehow... 🤔

@tofarr
Copy link
Collaborator Author

tofarr commented Feb 20, 2025

A concrete example:

When I use the litellm proxy as a base url and the API key, I get an error. When I use Anthropic directly (base_url=None), I don't get any error.

base_url = 'https://llm-proxy.staging.all-hands.dev'
api_key = '******'

from openhands.core.config.llm_config import LLMConfig
from pydantic import SecretStr
llm_config = LLMConfig(
  model='claude-3-5-sonnet-20241022',
  api_key=SecretStr(api_key),
  base_url=base_url,
)
from openhands.llm.llm import LLM
my_llm = LLM(llm_config)
params = {
  'messages': [
    {
      'content': [
        {
          'type': 'text',
          'text': 'You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.\n<IMPORTANT>\n* If user provides a path, you should NOT assume it\'s relative to the current working directory. Instead, you should explore the file system to find the file before working on it.\n* When configuring git credentials, use "openhands" as the user.name and "openhands@all-hands.dev" as the user.email by ******, unless explicitly instructed otherwise.\n* The assistant MUST NOT include comments in the code unless they are necessary to describe non-obvious behavior.\n</IMPORTANT>',
        }
      ],
      'role': 'system'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': '<RUNTIME_INFORMATION>\nThe user has access to the following hosts for accessing a web application,\neach of which has a corresponding port:\n* https://work-1-ewqvuejkdmbqiixb.staging-runtime.all-hands.dev (port 12000)\n* https://work-2-ewqvuejkdmbqiixb.staging-runtime.all-hands.dev (port 12001)\n\nWhen starting a web server, use the corresponding ports. You should also\nset any options to allow iframes and CORS requests, and allow the server to\nbe accessed from any host (e.g. 0.0.0.0).\n</RUNTIME_INFORMATION>'
        },
        {
          'type': 'text',
          'text': 'I\'m going to be testing the environment in which you run bash scripts.  Please execute a bash script that prints "Hello World"',
        }
      ],
      'role': 'user'
    },
    {
      'content': [],
      'role': 'assistant',
      'tool_calls': [
        {
          'id': 'toolu_01NfSAsfjPLS1kVnZhxWmsGR',
          'type': 'function',
          'function': {
            'name': 'execute_bash',
            'arguments': '{"command": "echo \\"Hello World\\""}'
          }
        }
      ]
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'Hello World\n[The command completed with exit code 0.]\n[Current working directory: /workspace]\n[Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]\n[Command finished with exit code 0]'
        }
      ],
      'role': 'tool',
      'tool_call_id': 'toolu_01NfSAsfjPLS1kVnZhxWmsGR',
      'name': 'execute_bash'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'I\'ve executed a simple bash command that prints "Hello World" to the terminal. The command completed successfully with exit code 0, and we can see that:\n1. The current working directory is `/workspace`\n2. There\'s a Python interpreter available at `/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python`\n\nIs there anything specific about the bash environment you\'d like me to test?'
        }
      ],
      'role': 'assistant'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': "Please execute the same script again. I've stopped the bash environment so it should throw an error this time",
        }
      ],
      'role': 'user'
    },
    {
      'content': [],
      'role': 'assistant',
      'tool_calls': [
        {
          'id': 'toolu_018NhYVRvCBngWinFJfRsDSY',
          'type': 'function',
          'function': {
            'name': 'execute_bash',
            'arguments': '{"command": "echo \\"Hello World\\""}'
          }
        }
      ]
    },
    {
      'content': [
        {
          'type': 'text',
          'text': 'The action has not been executed.\n[Error occurred in processing last action]'
        }
      ],
      'role': 'tool',
      'tool_call_id': 'toolu_018NhYVRvCBngWinFJfRsDSY',
      'name': 'execute_bash'
    },
    {
      'content': [
        {
          'type': 'text',
          'text': "Please execute the same script again. I've restarted the bash environment so it should no longer throw an error",
        }
      ],
      'role': 'user'
    }
  ],
  'tools': [
    {
      'type': 'function',
      'function': {
        'name': 'execute_bash',
        'description': 'Execute a bash command in the terminal.\n* Long running commands: For commands that may run indefinitely, it should be run in the background and the output should be redirected to a file, e.g. command = `python3 app.py > server.log 2>&1 &`.\n* Interact with running process: If a bash command returns exit code `-1`, this means the process is not yet finished. By setting `is_input` to `true`, the assistant can interact with the running process and send empty `command` to retrieve any additional logs, or send additional text (set `command` to the text) to STDIN of the running process, or send command like `C-c` (Ctrl+C), `C-d` (Ctrl+D), `C-z` (Ctrl+Z) to interrupt the process.\n* One command at a time: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'command': {
              'type': 'string',
              'description': 'The bash command to execute. Can be empty string to view additional logs when previous exit code is `-1`. Can be `C-c` (Ctrl+C) to interrupt the currently running process. Note: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.'
            },
            'is_input': {
              'type': 'string',
              'description': 'If True, the command is an input to the running process. If False, the command is a bash command to be executed in the terminal. Default is False.',
              'enum': [
                'true',
                'false'
              ]
            }
          },
          'required': [
            'command'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'finish',
        'description': 'Finish the interaction when the task is complete OR if the assistant cannot proceed further with the task.'
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'web_read',
        'description': 'Read (convert to markdown) content from a webpage. You should prefer using the `web_read` tool over the `browser` tool, but do use the `browser` tool if you need to interact with a webpage (e.g., click a button, fill out a form, etc.).\n\nYou may use the `web_read` tool to read content from a webpage, and even search the webpage content using a Google search query (e.g., url=`https://www.google.com/search?q=YOUR_QUERY`).\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'url': {
              'type': 'string',
              'description': 'The URL of the webpage to read. You can also use a Google search query here (e.g., `https://www.google.com/search?q=YOUR_QUERY`).'
            }
          },
          'required': [
            'url'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'browser',
        'description': 'Interact with the browser using Python code. Use it ONLY when you need to interact with a webpage.\n\nSee the description of "code" parameter for more details.\n\nMultiple actions can be provided at once, but will be executed sequentially without any feedback from the page.\nMore than 2-3 actions usually leads to failure or unexpected behavior. Example:\nfill(\'a12\', \'example with "quotes"\')\nclick(\'a51\')\nclick(\'48\', button=\'middle\', modifiers=[\'Shift\'])\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'code': {
              'type': 'string',
              'description': 'The Python code that interacts with the browser.\n\nThe following 15 functions are available. Nothing else is supported.\n\ngoto(url: str)\n    Description: Navigate to a url.\n    Examples:\n        goto(\'http://www.example.com\')\n\ngo_back()\n    Description: Navigate to the previous page in history.\n    Examples:\n        go_back()\n\ngo_forward()\n    Description: Navigate to the next page in history.\n    Examples:\n        go_forward()\n\nnoop(wait_ms: float = 1000)\n    Description: Do nothing, and optionally wait for the given time (in milliseconds).\n    You can use this to get the current page content and/or wait for the page to load.\n    Examples:\n        noop()\n\n        noop(500)\n\nscroll(delta_x: float, delta_y: float)\n    Description: Scroll horizontally and vertically. Amounts in pixels, positive for right or down scrolling, negative for left or up scrolling. Dispatches a wheel event.\n    Examples:\n        scroll(0, 200)\n\n        scroll(-50.2, -100.5)\n\nfill(bid: str, value: str)\n    Description: Fill out a form field. It focuses the element and triggers an input event with the entered text. It works for <input>, <textarea> and [contenteditable] elements.\n    Examples:\n        fill(\'237\', \'example value\')\n\n        fill(\'45\', \'multi-line\nexample\')\n\n        fill(\'a12\', \'example with "quotes"\')\n\nselect_option(bid: str, options: str | list[str])\n    Description: Select one or multiple options in a <select> element. You can specify option value or label to select. Multiple options can be selected.\n    Examples:\n        select_option(\'a48\', \'blue\')\n\n        select_option(\'c48\', [\'red\', \'green\', \'blue\'])\n\nclick(bid: str, button: Literal[\'left\', \'middle\', \'right\'] = \'left\', modifiers: list[typing.Literal[\'Alt\', \'Control\', \'ControlOrMeta\', \'Meta\', \'Shift\']] = [])\n    Description: Click an element.\n    Examples:\n        click(\'a51\')\n\n        click(\'b22\', button=\'right\')\n\n        click(\'48\', button=\'middle\', modifiers=[\'Shift\'])\n\ndblclick(bid: str, button: Literal[\'left\', \'middle\', \'right\'] = \'left\', modifiers: list[typing.Literal[\'Alt\', \'Control\', \'ControlOrMeta\', \'Meta\', \'Shift\']] = [])\n    Description: Double click an element.\n    Examples:\n        dblclick(\'12\')\n\n        dblclick(\'ca42\', button=\'right\')\n\n        dblclick(\'178\', button=\'middle\', modifiers=[\'Shift\'])\n\nhover(bid: str)\n    Description: Hover over an element.\n    Examples:\n        hover(\'b8\')\n\npress(bid: str, key_comb: str)\n    Description: Focus the matching element and press a combination of keys. It accepts the logical key names that are emitted in the keyboardEvent.key property of the keyboard events: Backquote, Minus, Equal, Backslash, Backspace, Tab, Delete, Escape, ArrowDown, End, Enter, Home, Insert, PageDown, PageUp, ArrowRight, ArrowUp, F1 - F12, Digit0 - Digit9, KeyA - KeyZ, etc. You can alternatively specify a single character you\'d like to produce such as "a" or "#". Following modification shortcuts are also supported: Shift, Control, Alt, Meta, ShiftLeft, ControlOrMeta. ControlOrMeta resolves to Control on Windows and Linux and to Meta on macOS.\n    Examples:\n        press(\'88\', \'Backspace\')\n\n        press(\'a26\', \'ControlOrMeta+a\')\n\n        press(\'a61\', \'Meta+Shift+t\')\n\nfocus(bid: str)\n    Description: Focus the matching element.\n    Examples:\n        focus(\'b455\')\n\nclear(bid: str)\n    Description: Clear the input field.\n    Examples:\n        clear(\'996\')\n\ndrag_and_drop(from_bid: str, to_bid: str)\n    Description: Perform a drag & drop. Hover the element that will be dragged. Press left mouse button. Move mouse to the element that will receive the drop. Release left mouse button.\n    Examples:\n        drag_and_drop(\'56\', \'498\')\n\nupload_file(bid: str, file: str | list[str])\n    Description: Click an element and wait for a "filechooser" event, then select one or multiple input files for upload. Relative file paths are resolved relative to the current working directory. An empty list clears the selected files.\n    Examples:\n        upload_file(\'572\', \'/home/user/my_receipt.pdf\')\n\n        upload_file(\'63\', [\'/home/bob/Documents/image.jpg\', \'/home/bob/Documents/file.zip\'])\n'
            }
          },
          'required': [
            'code'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'execute_ipython_cell',
        'description': 'Run a cell of Python code in an IPython environment.\n* The assistant should define variables and import packages before using them.\n* The variable defined in the IPython environment will not be available outside the IPython environment (e.g., in terminal).\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'code': {
              'type': 'string',
              'description': 'The Python code to execute. Supports magic commands like %pip.'
            }
          },
          'required': [
            'code'
          ]
        }
      }
    },
    {
      'type': 'function',
      'function': {
        'name': 'str_replace_editor',
        'description': 'Custom editing tool for viewing, creating and editing files in plain-text format\n* State is persistent across command calls and discussions with the user\n* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep\n* The `create` command cannot be used if the specified `path` already exists as a file\n* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`\n* The `undo_edit` command will revert the last edit made to the file at `path`\n\nNotes for using the `str_replace` command:\n* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!\n* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique\n* The `new_str` parameter should contain the edited lines that should replace the `old_str`\n',
        'parameters': {
          'type': 'object',
          'properties': {
            'command': {
              'description': 'The commands to run. Allowed options are: `view`, `create`, `str_replace`, `insert`, `undo_edit`.',
              'enum': [
                'view',
                'create',
                'str_replace',
                'insert',
                'undo_edit'
              ],
              'type': 'string'
            },
            'path': {
              'description': 'Absolute path to file or directory, e.g. `/workspace/file.py` or `/workspace`.',
              'type': 'string'
            },
            'file_text': {
              'description': 'Required parameter of `create` command, with the content of the file to be created.',
              'type': 'string'
            },
            'old_str': {
              'description': 'Required parameter of `str_replace` command containing the string in `path` to replace.',
              'type': 'string'
            },
            'new_str': {
              'description': 'Optional parameter of `str_replace` command containing the new string (if not given, no string will be added). Required parameter of `insert` command containing the string to insert.',
              'type': 'string'
            },
            'insert_line': {
              'description': 'Required parameter of `insert` command. The `new_str` will be inserted AFTER the line `insert_line` of `path`.',
              'type': 'integer'
            },
            'view_range': {
              'description': 'Optional parameter of `view` command when `path` points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g. [11, 12] will show lines 11 and 12. Indexing at 1 to start. Setting `[start_line, -1]` shows all lines from `start_line` to the end of the file.',
              'items': {
                'type': 'integer'
              },
              'type': 'array'
            }
          },
          'required': [
            'command',
            'path'
          ]
        }
      }
    }
  ]
}
response = my_llm.completion(**params)

@enyst
Copy link
Collaborator

enyst commented Feb 20, 2025

When I use the litellm proxy as a base url and the API key, I get an error. When I use Anthropic directly (base_url=None), I don't get any error.

Oh that sounds familiar! Two things here please:

  • if you try the code in this comment, we can see the point where liteLLM sends to the LLM API the dict. Right where a curl would hit the API. I see in the logs here is the dict before we sent to litellm, but that's why it may be relevant to see the other: if there's a difference, if it stripped down the tool_call_id (!) then we know.
  • please use litellm_proxy/anthropic/claude-3-5-sonnet-20241022 instead of litellm_proxy/claude-3-5-sonnet-20241022. The first accesses anthropic via the proxy, but the second checks a custom configuration defined on the proxy with the name 'claude-3-5-sonnet-20241022', then finds bedrock, sends the call to bedrock, ALWAYS fails (in the past few months), then executes litellm fallback code, then finds anthropic, then re-routes to anthropic. 🤷

Either way, it sounds like the likelihood is that we might not have a bug, but litellm does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants