INDEX
Explanations
titles or sections of stories within a text
repeated mentions of stories or articles
New Auto-Interp
Negative Logits
fame
-0.68
Wee
-0.64
agall
-0.63
ŃĶ
-0.63
Haram
-0.62
urous
-0.62
0000000000000000
-0.60
saf
-0.60
ullivan
-0.58
ãĥĺãĥ©
-0.58
POSITIVE LOGITS
Continued
1.06
continues
0.90
telling
0.89
Contin
0.86
stal
0.81
llers
0.63
Transcript
0.61
highlights
0.60
reprinted
0.60
CONTIN
0.60
Activations Density 0.007%