INDEX
    Explanations

    various types of formatting and syntax elements within code or text

    New Auto-Interp
    Negative Logits
    WriteTagHelper
    -1.22
    <unused8>
    -1.16
    <unused41>
    -1.16
    <unused74>
    -1.16
     laſſen
    -1.16
    [@BOS@]
    -1.16
    <unused16>
    -1.16
    <unused52>
    -1.16
     ब्रेकडाउन
    -1.16
    <unused43>
    -1.16
    POSITIVE LOGITS
    //
    0.65
    0.60
    #
    0.57
     I
    0.55
    :
    0.54
    _
    0.52
        
    0.49
    [toxicity=0]
    0.49
    I
    0.49
     most
    0.47
    Act Density 0.015%

    No Known Activations