INDEX
Explanations
phrases related to advertisements
instances of the word "Ad" indicating advertising-related content
New Auto-Interp
Negative Logits
metab
-0.66
DRAG
-0.65
Zur
-0.64
behav
-0.64
tomat
-0.64
blah
-0.62
Sting
-0.61
Fn
-0.61
Tacoma
-0.61
ÏĦ
-0.60
POSITIVE LOGITS
vertising
1.41
roximately
1.35
elaide
1.21
elligence
1.16
resa
1.16
withstanding
1.14
ellectual
1.13
cluding
1.11
ropolitan
1.10
xon
1.10
Activations Density 0.125%