INDEX
Explanations
the presence of the beginning of ordered sequences in the text
New Auto-Interp
Negative Logits
-0.48
'
-0.34
[…]
-0.32
<unused62>
-0.31
’
-0.28
mathrm
-0.28
<unused61>
-0.26
<tbody>
-0.26
[toxicity=0]
-0.23
begin
-0.22
POSITIVE LOGITS
Савезне
1.28
Personendaten
1.21
SharedCtor
1.16
ویکیپدی
1.14
TagMode
1.10
Normdatei
1.09
RegressionTest
1.05
SourceChecksum
1.05
мәкал
1.05
setVerticalGroup
1.03
Activations Density 0.006%