INDEX
Explanations
references to news headlines, specifically related to front-page stories or prominent candidates
references to "front page" in news or media contexts
New Auto-Interp
Negative Logits
ascript
-0.76
cles
-0.75
cham
-0.73
IRO
-0.73
awk
-0.69
ornings
-0.69
ilk
-0.68
fortune
-0.68
ptives
-0.67
dayName
-0.67
POSITIVE LOGITS
runners
1.05
iers
1.01
lawn
0.89
porch
0.86
court
0.80
office
0.79
burner
0.77
ing
0.77
lines
0.76
ages
0.76
Activations Density 0.026%