INDEX
Explanations
mentions of negative political situations or criticisms
New Auto-Interp
Negative Logits
secut
-0.74
CVE
-0.70
simultane
-0.69
ocent
-0.68
attentive
-0.66
uncond
-0.63
SPONSORED
-0.63
amen
-0.60
etheless
-0.60
relativity
-0.60
POSITIVE LOGITS
yard
1.06
yards
0.94
shaw
0.93
hire
0.89
books
0.88
book
0.87
aper
0.85
enhagen
0.85
herer
0.83
TING
0.83
Activations Density 0.022%