INDEX
Explanations
phrases discussing power dynamics and victimization in societal contexts
New Auto-Interp
Negative Logits
somewhat
-0.84
somewhat
-0.81
trochu
-0.77
biraz
-0.77
trochę
-0.75
Somewhat
-0.75
nieco
-0.75
agak
-0.72
Somewhat
-0.68
lidt
-0.67
POSITIVE LOGITS
unbelievably
1.10
absolutely
1.09
absolutamente
1.07
utterly
1.04
абсолютно
1.02
literally
1.01
incredibly
1.00
absolutely
0.98
extrêmement
0.97
perfectly
0.96
Activations Density 2.489%