INDEX
Explanations
mentions of the word "news" at high activation levels
words and phrases related to news and reporting
New Auto-Interp
Negative Logits
mutually
-0.65
istically
-0.64
rapists
-0.64
rapist
-0.62
goodwill
-0.62
Reconstruction
-0.62
unarmed
-0.61
disarm
-0.60
sighted
-0.59
uras
-0.59
POSITIVE LOGITS
chool
1.02
ews
1.00
ource
0.96
VIDEOS
0.91
hower
0.89
peed
0.89
atcher
0.86
ystem
0.86
ashington
0.86
velt
0.85
Activations Density 0.006%