INDEX
Explanations
the presence of specific formatting or structural elements in text, particularly in mathematical or quantitative contexts
New Auto-Interp
Negative Logits
?
-0.69
.
-0.66
)
-0.58
</h2>
-0.52
↵↵
-0.52
,
-0.52
:
-0.52
ii
-0.49
[toxicity=0]
-0.48
</b>
-0.47
POSITIVE LOGITS
Савезне
1.27
nakalista
1.03
&___
1.01
ьаж
0.97
Personendaten
0.96
########.
0.95
"]="
0.95
kaarangay
0.94
autorytatywna
0.93
enterOuterAlt
0.92
Activations Density 0.000%