INDEX
Explanations
the presence of specific introductory phrases or transitions in the text
New Auto-Interp
Negative Logits
myſelf
-1.04
دانشنامهٔ
-0.97
becauſe
-0.92
houſe
-0.91
BibitemShut
-0.89
itſelf
-0.89
Eſ
-0.89
ſeveral
-0.88
purpoſe
-0.88
Monfieur
-0.87
POSITIVE LOGITS
<eos>
1.09
<bos>
1.02
</strong>
0.95
</b>
0.92
</u>
0.80
</em>
0.78
’
0.78
'
0.74
0.73
</i>
0.72
Activations Density 0.015%