INDEX
Explanations
political figures and events in news articles
New Auto-Interp
Negative Logits
artif
-0.82
Lear
-0.78
blat
-0.75
ak
-0.75
metab
-0.65
Reloaded
-0.65
edIn
-0.65
assum
-0.64
crate
-0.63
paran
-0.63
POSITIVE LOGITS
aughs
0.84
inational
0.71
Abortion
0.70
respectively
0.68
igious
0.68
lees
0.67
culosis
0.67
ilitary
0.67
osponsors
0.67
illon
0.66
Activations Density 6.127%