INDEX
Explanations
statements related to political debates and policies
New Auto-Interp
Negative Logits
edia
-0.70
ery
-0.66
erick
-0.64
agram
-0.63
ETHOD
-0.62
etting
-0.61
ickers
-0.61
ridge
-0.60
rique
-0.60
icky
-0.59
POSITIVE LOGITS
except
1.61
except
1.60
Including
1.48
including
1.40
including
1.39
regardless
1.38
irrespective
1.34
INCLUD
1.25
excluding
1.17
imaginable
1.16
Activations Density 0.424%