INDEX
Explanations
descriptions of violence and social injustice
New Auto-Interp
Negative Logits
ende
-0.16
Bud
-0.15
.uc
-0.15
ãĤ¹ãĤ«
-0.15
ÛĮتÛĮ
-0.14
sna
-0.14
Becker
-0.14
agged
-0.14
kla
-0.14
kp
-0.14
POSITIVE LOGITS
odox
0.16
auer
0.16
Presbyterian
0.15
áo
0.14
TestCategory
0.14
mill
0.13
aste
0.13
วà¸Ļ
0.13
astes
0.13
eza
0.13
Activations Density 0.141%