INDEX
Explanations
mentions of news outlets or news-related content
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.06
3:0.05
4:0.05
5:0.05
6:0.35
7:0.06
8:0.04
9:0.05
10:0.09
11:0.09
Negative Logits
smell
-1.32
lawy
-1.30
malice
-1.23
orphans
-1.22
synergy
-1.22
natureconservancy
-1.22
ferment
-1.20
Takeru
-1.17
emulate
-1.15
depend
-1.15
POSITIVE LOGITS
roth
2.02
thur
1.65
Finder
1.62
essional
1.58
aic
1.51
icons
1.48
nic
1.48
foundland
1.48
aires
1.41
letters
1.40
Activations Density 0.003%