INDEX
Explanations
news-related words such as "news" and "newsletter"
references to news articles or news-related content
New Auto-Interp
Negative Logits
sweat
-0.76
cold
-0.72
slee
-0.71
erect
-0.69
goddamn
-0.68
gged
-0.67
angs
-0.67
vol
-0.66
stagger
-0.66
ibr
-0.65
POSITIVE LOGITS
NEWS
1.10
news
1.02
letters
1.01
Reporting
0.96
chool
0.90
News
0.87
letter
0.86
headlines
0.82
lisher
0.82
Coverage
0.82
Activations Density 0.004%