INDEX
Explanations
references to individuals and groups involved in activism
New Auto-Interp
Negative Logits
eda
-0.17
ÏĦαν
-0.15
ents
-0.15
etak
-0.15
ricks
-0.15
.dsl
-0.15
itorio
-0.14
hott
-0.14
Gould
-0.14
idl
-0.14
POSITIVE LOGITS
arger
0.16
acre
0.15
ungle
0.14
olson
0.14
arga
0.14
ized
0.14
zie
0.13
uger
0.13
426
0.13
ãģĽ
0.13
Activations Density 0.008%