无微调基准测试

CUDA_VISIBLE_DEVICES=6,7,8,9
python -m vllm.entrypoints.openai.api_server
–model /netcache/huggingface/Qwen2.5-14B-Instruct
–tensor-parallel-size 4
–host 0.0.0.0
–port 18001
–gpu-memory-utilization 0.7
–dtype float16

基准测试结果

(无任何规则)
alt text
“timestamp”: “20251231_073007”,
“accuracy”: 0.4175,
“recall”: 0.4598,
“f1_score”: 0.3962,
“valid_count”: 103,
“correct_count”: 43,
“total_scenarios”: 103,
“correct_rate”: “41.75%”

(定性规则)
alt text
“timestamp”: “20251231_081307”,
“accuracy”: 0.4369,
“recall”: 0.3942,
“f1_score”: 0.3921,
“valid_count”: 103,
“correct_count”: 45,
“total_scenarios”: 103,
“correct_rate”: “43.69%”

(定量规则)
alt text
“timestamp”: “20251231_082437”,
“accuracy”: 0.3883,
“recall”: 0.3157,
“f1_score”: 0.2094,
“valid_count”: 103,
“correct_count”: 40,
“total_scenarios”: 103,
“correct_rate”: “38.83%”

维度无规则定性规则定量规则说明
正确率41.75 %43.69 %38.83 %都徘徊在 40 % 左右,比瞎猜稍微好一点。
召回率45.98 %39.42 %31.57 %无规则反而最高,说明人工规则写偏,可能误杀正样本。
F139.62 %39.21 %20.94 %定量规则 F1 崩到 20 %, Precision 被严重拉低。
规则副作用+1.94 pp–2.92 pp定性规则略涨,定量规则直接负优化,说明无缘由规则很容易负优化。

下周计划,利用现有数据尝试lora微调,提升模型对于轨迹的理解能力。

提高采样帧数,并且提取部分特征(轨迹统计特征)

alt text
alt text
"timestamp": "20251231_092829",
"accuracy": 0.534,
"recall": 0.4773,
"f1_score": 0.4641,
"valid_count": 103,
"correct_count": 55,
"total_scenarios": 103,
"correct_rate": "53.4%"
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇