INDEX
Explanations
captions or headings in articles
text formatting indicators and structural elements
New Auto-Interp
Negative Logits
112
-0.70
82
-0.67
112
-0.64
78
-0.63
ERC
-0.62
1932
-0.62
222
-0.61
138
-0.61
138
-0.60
132
-0.60
POSITIVE LOGITS
5
1.34
5
1.24
Fif
0.97
five
0.92
fifth
0.90
fifth
0.90
525
0.87
ķ
0.85
505
0.85
525
0.84
Activations Density 0.065%