INDEX
Negative Logits
figures
-0.08
魅
-0.08
wills
-0.08
Kristin
-0.08
appeals
-0.08
vou
-0.08
passion
-0.07
passionate
-0.07
ممتاز
-0.07
Exhib
-0.07
POSITIVE LOGITS
删除
0.10
削
0.10
训练
0.09
Deleting
0.09
rän
0.09
删除
0.09
Deleted
0.09
掉
0.09
Deletion
0.08
Training
0.08
Activations Density 0.001%