INDEX
Explanations
phrases related to political debates and statements
sentences that express opinions or remarks
New Auto-Interp
Negative Logits
undet
-0.81
unlocks
-0.74
reversible
-0.74
iencies
-0.72
versible
-0.71
handy
-0.71
dips
-0.71
unpredict
-0.69
fucked
-0.68
bolt
-0.68
POSITIVE LOGITS
Others
1.17
They
1.16
Their
1.08
Critics
1.05
Specifically
1.00
Speaking
1.00
Some
0.99
Officials
0.98
Advoc
0.98
Supporters
0.98
Activations Density 0.490%