INDEX
Explanations
phrases related to public events or protests
references to public events or gatherings for a cause
New Auto-Interp
Negative Logits
oho
-0.70
ecd
-0.68
efe
-0.68
saline
-0.67
lake
-0.67
chance
-0.66
bons
-0.66
esthetic
-0.65
paste
-0.64
laus
-0.64
POSITIVE LOGITS
demonstration
0.89
demonstrations
0.79
GOODMAN
0.75
emonium
0.71
glim
0.71
demonstrators
0.70
arily
0.68
ary
0.68
antes
0.68
ank
0.67
Activations Density 0.012%