INDEX
Explanations
words related to protests
references to various types of protests and festivals
New Auto-Interp
Negative Logits
gap
-0.70
nesia
-0.68
bra
-0.65
forward
-0.65
splitting
-0.63
space
-0.63
brain
-0.62
stable
-0.62
smart
-0.61
planes
-0.61
POSITIVE LOGITS
ests
1.26
imony
1.04
iqu
0.94
icism
0.92
iques
0.85
osterone
0.84
ongs
0.83
icult
0.83
imates
0.82
icist
0.82
Activations Density 0.011%