INDEX
Explanations
references to media influence, results, and underlying systems in society
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.08
3:0.08
4:0.36
5:0.04
6:0.04
7:0.13
8:0.04
9:0.03
10:0.05
11:0.06
Negative Logits
onge
-1.67
yip
-1.65
outheast
-1.57
URI
-1.53
uu
-1.51
ustomed
-1.50
elope
-1.48
scrib
-1.48
osponsors
-1.45
swer
-1.41
POSITIVE LOGITS
thood
1.59
Bridge
1.54
MSG
1.45
dash
1.44
Caption
1.44
Trilogy
1.41
Velvet
1.38
deliberations
1.38
ogenesis
1.37
experiments
1.37
Activations Density 0.018%