Multidimensional Rubric-oriented Reward Model Learning via Geometric Projection Reference Constraints
Paper
โข
2511.16139
โข
Published
Shanzhi-M1 is a medical LLM alignment framework developed by Shanghai Mingpin Medical Data Technology Co., Ltd. Built on the Qwen3-32B base model, it addresses core pain points of existing medical LLMs (misalignment with clinical cognition, poor adaptation to dynamic standards, high reward training costs) via three innovative designs. The framework integrates authoritative medical standards into the full training pipeline, enabling medical AI to transition from "technically feasible" to "medically trustworthy."
| Clinical Scenario | Score | Performance Note |
|---|---|---|
| Emergency Referrals | 74.3 | Highest among all tested models; prioritizes risk timeliness |
| Medical Communication | 69.6 | Excels in patient adherence guidance |
| Context Seeking | 58.5 | Strong at proactive clinical information collection |
| Global Health | 59.2 | Adapts to diverse regional medical standards |
| Context Awareness | 52.4 | Maintains consistency across multi-turn clinical conversations |
L (Core Dimensions): e.g., Information Content Quality, Clinical Reasoning, Compliance.M (Scenarios): e.g., Chronic Disease Management, Pediatric Consultation, Emergency Triage.N (Disciplines): e.g., Internal Medicine, Surgery, Pharmacy, Nursing.(q, ab_i,q, ar_i,q, aw_i,q, Desc_i) (question, good/medium/poor answers, dimension description).pip install transformers torch vllm sglang>=0.4.6.post1
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load model and tokenizer (replace with Hugging Face repo once released)
model_name = "mingpinDZJ/Shanzhi-M1"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto"
)
# Example: Medical query (pediatric medication safety)
prompt = "A 3-year-old child (pediatric scenario) is taking acetaminophen for fever. Can they also take compound cold medicine? Please explain the risks and recommendations."
# Format chat input (supports multi-turn conversations)
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
thinking_mode="on" # Enables clinical reasoning logging
)
# Generate response
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1024,
temperature=0.1, # Low temperature for clinical accuracy
top_p=0.95
)
# Parse output (separate reasoning process from final answer)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
try:
# Locate reasoning-end token ()
reason_end_idx = len(output_ids) - output_ids[::-1].index(tokenizer.encode("")[0])
reasoning = tokenizer.decode(output_ids[:reason_end_idx], skip_special_tokens=True)
answer = tokenizer.decode(output_ids[reason_end_idx:], skip_special_tokens=True)
except ValueError:
reasoning = "No explicit reasoning logged."
answer = tokenizer.decode(output_ids, skip_special_tokens=True)
# Print results
print(f"Clinical Reasoning: {reasoning}")
print(f"Final Answer: {answer}")
python -m sglang.launch_server \
--model-path mingpinDZJ/Shanzhi-M1 \
--reasoning-parser qwen3 \
--mem-fraction 0.9
vllm serve mingpinDZJ/Shanzhi-M1 \
--reasoning-parser qwen3 \
--tensor-parallel-size 1 \
--dtype bfloat16
Licensed under the Apache License 2.0. Permitted for:
Base model
Qwen/Qwen3-32B