INDEX
Explanations
references to political figures and their statements or actions
New Auto-Interp
Negative Logits
affer
-0.17
agen
-0.17
.localized
-0.16
Anchor
-0.15
ulg
-0.15
analyzes
-0.14
mention
-0.14
gaps
-0.14
Suit
-0.14
ocol
-0.14
POSITIVE LOGITS
yesterday
0.23
flag
0.22
today
0.21
rub
0.20
reveal
0.19
tonight
0.18
revealing
0.18
welcoming
0.18
robust
0.18
Yesterday
0.17
Activations Density 0.149%