INDEX
Explanations
phrases related to news headlines or articles
references to political events and discussions, particularly those involving interviews and statements
New Auto-Interp
Negative Logits
artif
-0.80
ak
-0.77
Lear
-0.74
blat
-0.72
assum
-0.69
appre
-0.64
eday
-0.63
nav
-0.63
metab
-0.63
edIn
-0.62
POSITIVE LOGITS
isconsin
0.84
igious
0.74
Republican
0.74
aughs
0.72
olin
0.71
lees
0.71
ptive
0.69
uania
0.68
ributes
0.68
Lago
0.68
Activations Density 0.065%