INDEX
Explanations
segments of text that indicate the start of a new section or paragraph
the beginning of sentences or paragraphs
finite difference
New Auto-Interp
Negative Logits
,
-0.72
a
-0.71
and
-0.65
in
-0.65
of
-0.64
on
-0.64
as
-0.63
her
-0.62
et
-0.60
val
-0.59
POSITIVE LOGITS
itſelf
1.33
ſeveral
1.27
myſelf
1.26
Efq
1.19
whoſe
1.18
ſelves
1.18
Monfieur
1.17
Houſe
1.15
Reſ
1.13
raiſ
1.11
Activations Density 0.409%