INDEX
Negative Logits
assertTrue
-0.07
Appro
-0.07
intervention
-0.07
Otto
-0.06
thoroughly
-0.06
(targets
-0.06
-0.06
Airbus
-0.06
عبار
-0.06
taking
-0.06
POSITIVE LOGITS
catastrophe
0.07
.sul
0.06
надання
0.06
舍
0.06
saç
0.06
lane
0.06
)'),
0.06
алізації
0.06
chatter
0.06
데이트
0.05
Activations Density 0.009%