INDEX
Explanations
mentions of specific groups associated with radical ideologies
New Auto-Interp
Negative Logits
vertisement
-0.17
ather
-0.17
ayar
-0.15
окÑĢем
-0.14
ach
-0.14
amerate
-0.14
Hv
-0.14
sein
-0.14
ictim
-0.14
abus
-0.14
POSITIVE LOGITS
Lumpur
0.17
-large
0.15
vá»±c
0.15
base
0.15
Klan
0.15
-big
0.15
Bryce
0.14
base
0.14
etz
0.14
stk
0.14
Activations Density 0.019%