INDEX
Explanations
various types of formatting and syntax elements within code or text
// and /// comments
New Auto-Interp
Negative Logits
WriteTagHelper
-1.22
<unused8>
-1.16
<unused41>
-1.16
<unused74>
-1.16
laſſen
-1.16
[@BOS@]
-1.16
<unused16>
-1.16
<unused52>
-1.16
ब्रेकडाउन
-1.16
<unused43>
-1.16
POSITIVE LOGITS
//
0.65
0.60
#
0.57
I
0.55
:
0.54
_
0.52
0.49
[toxicity=0]
0.49
I
0.49
most
0.47
Activations Density 0.015%