About tool calling

#3
by sjzy23 - opened

Upon deploying the model via vLLM, I am able to invoke tools; however, it appears incapable of properly receiving tool responses. It perpetually reiterates tool invocations.

deploy args:

        args:
          - "--host=0.0.0.0"
          - "--port=8080"
          - "--model=RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16"
          - "--trust-remote-code"
          - "--served-model-name=Llama-4-Scout-17B-16E-Instruct"
          - "--max-model-len=128K"
          - "--tensor-parallel-size=4"
          - "--gpu-memory-utilization=0.9"
          - "--enable-auto-tool-choice"
          - "--tool-call-parser=llama3_json"

trace:

image.png

finally response:

image.png

What measures should I undertake to ensure its proper functionality?

Sign up or log in to comment