INDEX
Explanations
various forms of the word "ad" or references to advertisements
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.05
3:0.06
4:0.06
5:0.05
6:0.41
7:0.05
8:0.05
9:0.06
10:0.07
11:0.04
Negative Logits
leep
-1.44
¯
-1.42
usercontent
-1.41
icked
-1.31
Hallow
-1.23
seniors
-1.20
compuls
-1.19
sed
-1.16
etheless
-1.15
Jolly
-1.14
POSITIVE LOGITS
igham
1.74
utical
1.45
IRO
1.44
ciation
1.36
iscons
1.35
ignt
1.33
emale
1.30
ダ
1.30
rouse
1.28
combe
1.27
Activations Density 0.002%