INDEX
Explanations
instances of reporting or citing sources in news articles
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.08
3:0.10
4:0.20
5:0.03
6:0.05
7:0.14
8:0.07
9:0.05
10:0.12
11:0.08
Negative Logits
handshake
-1.56
iors
-1.45
facade
-1.38
tremend
-1.37
resting
-1.36
satisfied
-1.35
matched
-1.35
respect
-1.35
ifter
-1.33
oooooooooooooooo
-1.33
POSITIVE LOGITS
blogs
1.73
Buzz
1.54
Sources
1.50
Polit
1.49
blog
1.49
GMT
1.41
ipedia
1.40
Aug
1.40
WD
1.40
UTH
1.38
Activations Density 0.003%