INDEX
Explanations
punctuation marks, particularly commas
New Auto-Interp
Negative Logits
<bos>
-1.21
'
-0.93
’
-0.82
“
-0.77
2
-0.74
<h2>
-0.74
<h1>
-0.70
"
-0.70
(
-0.69
1
-0.67
POSITIVE LOGITS
myſelf
0.80
verständlich
0.80
oredCriteria
0.78
BibitemShut
0.78
היתה
0.75
Roskov
0.74
bibinfo
0.72
σθαι
0.72
uttavia
0.72
bibfield
0.70
Activations Density 0.429%