INDEX
Negative Logits
mø
-0.08
побач
-0.07
Entr
-0.06
Drill
-0.06
jente
-0.06
ouis
-0.06
>>>>>>>
-0.06
chosen
-0.06
Compar
-0.06
preserved
-0.06
POSITIVE LOGITS
lang
0.14
Lang
0.14
lang
0.13
Lang
0.12
LANG
0.10
.Lang
0.10
/lang
0.10
langs
0.09
LANG
0.09
(lang
0.09
Activations Density 0.005%