INDEX
Explanations
phrases related to newsletters and updates
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.18
3:0.08
4:0.09
5:0.03
6:0.23
7:0.06
8:0.05
9:0.07
10:0.06
11:0.05
Negative Logits
ovember
-1.47
ooks
-1.41
ynt
-1.39
eele
-1.38
lly
-1.32
erent
-1.30
lf
-1.30
utral
-1.29
aleb
-1.27
inav
-1.27
POSITIVE LOGITS
Corpus
1.38
POWER
1.21
corrid
1.17
�
1.14
UNHCR
1.10
BuyableInstoreAndOnline
1.09
accordingly
1.08
Shine
1.08
Hispan
1.07
opio
1.06
Activations Density 0.002%