CUDA_VISIBLE_DEVICES=6,7,8,9
python -m vllm.entrypoints.openai.api_server
–model /netcache/huggingface/Qwen2.5-14B-Instruct
–tensor-parallel-size 4
–host 0.0.0.0
–port 18001
–gpu-memory-utilization 0.7
–dtype float16
基准测试结果
(无任何规则)
“timestamp”: “20251231_073007”,
“accuracy”: 0.4175,
“recall”: 0.4598,
“f1_score”: 0.3962,
“valid_count”: 103,
“correct_count”: 43,
“total_scenarios”: 103,
“correct_rate”: “41.75%”
(定性规则)
“timestamp”: “20251231_081307”,
“accuracy”: 0.4369,
“recall”: 0.3942,
“f1_score”: 0.3921,
“valid_count”: 103,
“correct_count”: 45,
“total_scenarios”: 103,
“correct_rate”: “43.69%”
(定量规则)
“timestamp”: “20251231_082437”,
“accuracy”: 0.3883,
“recall”: 0.3157,
“f1_score”: 0.2094,
“valid_count”: 103,
“correct_count”: 40,
“total_scenarios”: 103,
“correct_rate”: “38.83%”
| 维度 | 无规则 | 定性规则 | 定量规则 | 说明 |
|---|---|---|---|---|
| 正确率 | 41.75 % | 43.69 % | 38.83 % | 都徘徊在 40 % 左右,比瞎猜稍微好一点。 |
| 召回率 | 45.98 % | 39.42 % | 31.57 % | 无规则反而最高,说明人工规则写偏,可能误杀正样本。 |
| F1 | 39.62 % | 39.21 % | 20.94 % | 定量规则 F1 崩到 20 %, Precision 被严重拉低。 |
| 规则副作用 | — | +1.94 pp | –2.92 pp | 定性规则略涨,定量规则直接负优化,说明无缘由规则很容易负优化。 |
下周计划,利用现有数据尝试lora微调,提升模型对于轨迹的理解能力。
提高采样帧数,并且提取部分特征(轨迹统计特征)


"timestamp": "20251231_092829", "accuracy": 0.534, "recall": 0.4773, "f1_score": 0.4641, "valid_count": 103, "correct_count": 55, "total_scenarios": 103, "correct_rate": "53.4%"