INDEX
Explanations
informational cues such as section headings and advertisements within a text
repeated references to "story" and "advertisement" in the text
New Auto-Interp
Negative Logits
cele
-0.73
Amit
-0.63
Imper
-0.62
monog
-0.62
cel
-0.61
unrecogn
-0.60
ste
-0.60
pri
-0.59
authent
-0.58
phen
-0.58
POSITIVE LOGITS
iculty
0.66
espie
0.66
etary
0.65
Extras
0.65
VIDEOS
0.64
acters
0.63
yright
0.63
miah
0.63
ERY
0.62
iola
0.62
Activations Density 0.109%