INDEX
Explanations
phrases expressing resistance or susceptibility
New Auto-Interp
Negative Logits
erto
-0.17
reck
-0.16
bons
-0.15
erah
-0.15
ierce
-0.15
meni
-0.14
937
-0.14
yro
-0.14
Kostenlose
-0.14
eki
-0.14
POSITIVE LOGITS
gether
0.16
asting
0.15
ledo
0.14
asty
0.14
nov
0.14
Keeper
0.13
extr
0.13
Crimes
0.13
col
0.13
extr
0.13
Activations Density 0.200%