INDEX
Explanations
numerical values and associated formatting in mathematical or legal contexts
New Auto-Interp
Negative Logits
[toxicity=0]
-0.41
<eos>
-0.34
<i>
-0.29
1
-0.28
Gerechtigkeit
-0.28
<b>
-0.27
2
-0.26
kaybet
-0.26
Weiterbildung
-0.25
Versammlung
-0.24
POSITIVE LOGITS
SequentialGroup
1.26
مشين
1.13
ब्रेकडाउन
1.10
فريبيس
1.09
propOrder
1.07
Personendaten
1.07
ValueStyle
1.05
LookAnd
1.03
كومونز
1.01
autorytatywna
0.99
Activations Density 0.841%