INDEX
Explanations
numerical indicators of the order of written content
references to specific chapters in a text
New Auto-Interp
Negative Logits
oggles
-0.76
zzle
-0.74
fitting
-0.71
pes
-0.69
berman
-0.67
yg
-0.65
zzi
-0.63
enez
-0.63
claimed
-0.63
otal
-0.62
POSITIVE LOGITS
chapter
0.99
chapters
0.94
ĸļ
0.90
apeake
0.84
chapter
0.83
Chapter
0.72
GOODMAN
0.71
Chapters
0.71
acters
0.71
20439
0.70
Activations Density 0.004%