INDEX
Explanations
words related to news organizations or news content
instances of the word "News"
New Auto-Interp
Negative Logits
ught
-0.71
uras
-0.69
vernment
-0.67
uates
-0.61
downs
-0.61
uations
-0.61
uve
-0.61
hump
-0.61
lumber
-0.61
stagger
-0.60
POSITIVE LOGITS
letters
1.37
room
1.29
Hour
1.13
peak
1.09
radio
1.09
groups
1.05
night
1.04
Radio
0.99
reader
0.98
letter
0.98
Activations Density 0.033%