INDEX
Explanations
phrases related to social equality and injustice
New Auto-Interp
Negative Logits
ereotype
-0.15
HOOK
-0.15
hook
-0.14
ernen
-0.14
bells
-0.14
hog
-0.14
prt
-0.14
aldi
-0.13
Tracker
-0.13
uggy
-0.13
POSITIVE LOGITS
nou
0.17
anz
0.14
andest
0.14
Nielsen
0.14
odon
0.13
driving
0.13
istrovstvÃŃ
0.13
inton
0.13
instrumental
0.13
Mell
0.13
Activations Density 0.019%