INDEX
Explanations
occurrences of introductory words or phrases indicating a new section or thought in a text
New Auto-Interp
Negative Logits
$MESS
-0.22
owing
-0.21
serrat
-0.21
paring
-0.20
plevel
-0.18
leccion
-0.18
complete
-0.17
give
-0.17
ersiz
-0.17
\č↵
-0.16
POSITIVE LOGITS
odore
0.56
adays
0.54
etheless
0.42
atre
0.38
atomy
0.32
bsites
0.32
tlement
0.29
achusetts
0.29
jamin
0.28
stitute
0.28
Activations Density 0.718%