INDEX
Explanations
phrases related to political statements or actions
New Auto-Interp
Negative Logits
ense
-0.74
orient
-0.70
axter
-0.69
brid
-0.68
ENTS
-0.66
est
-0.66
encount
-0.65
ele
-0.64
outheast
-0.64
etheless
-0.64
POSITIVE LOGITS
Nope
1.48
Yep
1.20
Yeah
1.15
Absolutely
1.10
Nah
1.10
Probably
1.09
Possibly
1.07
Hmm
1.01
Yes
1.01
Sure
1.00
Activations Density 0.077%