INDEX
Explanations
the presence of text formatting markers or structural elements in the document
New Auto-Interp
Negative Logits
myſelf
-1.39
itſelf
-1.37
Reſ
-1.32
Anſ
-1.22
Theſe
-1.21
Houſe
-1.21
Efq
-1.21
Diſ
-1.19
Monfieur
-1.19
―――――
-1.19
POSITIVE LOGITS
0.73
↵↵
0.69
|
0.63
The
0.61
<eos>
0.60
•
0.58
<h1>
0.57
.
0.57
)))),
0.56
↵
0.55
Activations Density 0.033%