INDEX
Negative Logits
Infos
-0.08
uomo
-0.07
homosex
-0.06
людям
-0.06
TEXT
-0.06
getObject
-0.06
ondere
-0.06
orientation
-0.06
-strokes
-0.06
دارم
-0.06
POSITIVE LOGITS
repeal
0.13
repealed
0.10
dismant
0.08
deline
0.07
dismantle
0.07
ané
0.07
-------------</
0.07
debunk
0.06
mong
0.06
makta
0.06
Activations Density 0.001%