INDEX
Explanations
references to advertisements and advertising-related content
New Auto-Interp
Negative Logits
ĨĴ
-0.88
bred
-0.76
lihood
-0.76
otype
-0.70
Cyprus
-0.69
20439
-0.68
Patriarch
-0.68
theless
-0.68
Caribbean
-0.66
hatch
-0.64
POSITIVE LOGITS
vertising
1.22
verts
0.98
vertis
0.95
ads
0.94
vertisements
0.93
idas
0.93
vertisement
0.91
elaide
0.90
strip
0.88
blocking
0.84
Activations Density 0.013%