INDEX
Explanations
phrases related to articles and advertisements
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.32
3:0.07
4:0.11
5:0.03
6:0.13
7:0.06
8:0.04
9:0.05
10:0.06
11:0.05
Negative Logits
owship
-1.57
quartered
-1.54
arent
-1.52
oided
-1.44
ooting
-1.42
reet
-1.38
pless
-1.36
esville
-1.34
ezvous
-1.33
asted
-1.32
POSITIVE LOGITS
Whats
1.46
裏�
1.37
Accessory
1.36
cmp
1.35
�醒
1.33
968
1.32
802
1.28
1.27
cannabin
1.27
ilon
1.25
Activations Density 0.003%