INDEX
Explanations
references to political or social reforms
references to reform-related topics or movements
New Auto-Interp
Negative Logits
gaard
-0.65
infer
-0.62
McA
-0.61
ordon
-0.61
Saud
-0.60
ammy
-0.60
DoS
-0.59
Bryant
-0.59
Coulter
-0.59
Starg
-0.58
POSITIVE LOGITS
ulation
1.19
atories
1.17
atted
1.16
ers
1.16
ulated
1.16
ulatory
1.11
ulations
1.11
er
1.04
rats
1.01
ulating
0.96
Activations Density 0.041%