INDEX
Explanations
occurrences of the word "Advertisement" in the text
New Auto-Interp
Negative Logits
ctr
-0.89
reason
-0.67
Ranked
-0.67
cript
-0.64
icate
-0.63
enser
-0.63
ctor
-0.62
ayer
-0.61
ensibly
-0.60
mpeg
-0.59
POSITIVE LOGITS
docking
0.67
DOI
0.64
environment
0.64
DRAGON
0.60
↵
0.60
elson
0.59
bed
0.59
<|endoftext|>
0.58
Adams
0.58
↵
0.57
Activations Density 0.008%