INDEX
Explanations
phrases related to social justice and activism
New Auto-Interp
Negative Logits
nud
-0.15
Loft
-0.14
_ASSUME
-0.14
pheric
-0.14
걸
-0.14
кÑĥл
-0.14
istrovstvÃŃ
-0.13
uario
-0.13
hq
-0.13
ofile
-0.13
POSITIVE LOGITS
#
0.22
We
0.22
No
0.20
stop
0.20
Stop
0.20
enough
0.19
Stand
0.19
no
0.19
we
0.18
Vote
0.18
Activations Density 0.123%