RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

About tool calling

by sjzy23 - opened Jun 16

Jun 16

Upon deploying the model via vLLM, I am able to invoke tools; however, it appears incapable of properly receiving tool responses. It perpetually reiterates tool invocations.

deploy args:

        args:
          - "--host=0.0.0.0"
          - "--port=8080"
          - "--model=RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16"
          - "--trust-remote-code"
          - "--served-model-name=Llama-4-Scout-17B-16E-Instruct"
          - "--max-model-len=128K"
          - "--tensor-parallel-size=4"
          - "--gpu-memory-utilization=0.9"
          - "--enable-auto-tool-choice"
          - "--tool-call-parser=llama3_json"

trace:

finally response:

What measures should I undertake to ensure its proper functionality?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment