INDEX
Negative Logits
scaling
-0.08
Bank
-0.08
BANK
-0.08
verrou
-0.07
pesado
-0.07
isn't
-0.07
.cur
-0.07
_LOCK
-0.07
nla
-0.07
Hagen
-0.07
POSITIVE LOGITS
leo
0.09
tones
0.08
juicio
0.08
sexuelle
0.08
ţi
0.08
judging
0.07
ffred
0.07
tone
0.07
Tone
0.07
yny
0.07
Activations Density 0.001%