INDEX
Explanations
phrases related to activism or protests
New Auto-Interp
Negative Logits
inguishable
-0.73
onom
-0.70
ongo
-0.66
imet
-0.65
-0.65
abul
-0.64
eport
-0.63
velop
-0.63
omal
-0.63
utsche
-0.61
POSITIVE LOGITS
blah
1.22
secondly
1.13
etc
1.12
furthermore
1.02
lest
0.99
THEN
0.96
oh
0.93
thereby
0.88
etc
0.88
)).
0.85
Activations Density 0.603%