INDEX
Explanations
specific formatting and structural elements in documents
numeric values and numerical expressions in code or mathematical notation.
New Auto-Interp
Negative Logits
queſta
-0.84
ddelwed
-0.78
$_(
-0.78
betweenstory
-0.75
increí
-0.73
dieß
-0.73
Verſ
-0.73
ſind
-0.72
Italijani
-0.70
dieſes
-0.69
POSITIVE LOGITS
</
0.43
<h2>
0.43
<b>
0.42
[toxicity=0]
0.38
ag
0.37
<h1>
0.37
به
0.36
></
0.36
</
0.36
appunto
0.36
Activations Density 1.825%