INDEX
Explanations
phrases related to publicity and attention
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.08
3:0.06
4:0.10
5:0.03
6:0.03
7:0.35
8:0.03
9:0.05
10:0.08
11:0.08
Negative Logits
otomy
-1.62
inement
-1.50
ometry
-1.47
omers
-1.42
guided
-1.38
Harmony
-1.36
ternal
-1.35
esthetic
-1.35
uitive
-1.34
itored
-1.32
POSITIVE LOGITS
Crusade
1.62
libel
1.58
bucks
1.45
oneself
1.44
accuser
1.42
domestically
1.38
Reporting
1.35
fraudulent
1.35
sensational
1.34
tremend
1.34
Activations Density 0.002%