About tool calling
#3
by
sjzy23
- opened
Upon deploying the model via vLLM, I am able to invoke tools; however, it appears incapable of properly receiving tool responses. It perpetually reiterates tool invocations.
deploy args:
args:
- "--host=0.0.0.0"
- "--port=8080"
- "--model=RedHatAI/Llama-4-Scout-17B-16E-Instruct-quantized.w4a16"
- "--trust-remote-code"
- "--served-model-name=Llama-4-Scout-17B-16E-Instruct"
- "--max-model-len=128K"
- "--tensor-parallel-size=4"
- "--gpu-memory-utilization=0.9"
- "--enable-auto-tool-choice"
- "--tool-call-parser=llama3_json"
trace:
finally response:
What measures should I undertake to ensure its proper functionality?

