INDEX
Explanations
sections with repeated sequences or structural patterns in the document
New Auto-Interp
Negative Logits
-0.71
“
-0.70
even
-0.67
.
-0.66
‘
-0.65
.
-0.64
,
-0.63
for
-0.60
e
-0.59
-
-0.59
POSITIVE LOGITS
myſelf
1.09
occaf
1.05
purpoſe
1.03
Tikang
0.99
reaſon
0.96
itſelf
0.94
themſelves
0.94
poffible
0.94
pleaſure
0.94
cauſe
0.92
Activations Density 0.029%