INDEX
Explanations
the presence of document structure elements, particularly markers indicating the beginning of sections or other important formatting features
New Auto-Interp
Negative Logits
Roskov
-0.97
purpoſe
-0.97
preſent
-0.95
cauſe
-0.93
houſe
-0.93
juſ
-0.93
uſe
-0.92
Reſ
-0.92
ſy
-0.92
twimg
-0.90
POSITIVE LOGITS
</em>
1.13
</i>
1.02
</strong>
0.98
</b>
0.87
</sup>
0.86
</sub>
0.83
</u>
0.77
}}$
0.76
,
0.73
</s>
0.72
Activations Density 0.055%