INDEX
    Explanations

    mathematical expressions and symbols commonly used in formal proofs

    New Auto-Interp
    Negative Logits
    s
    -0.68
    -
    -0.65
    }
    -0.62
    )
    -0.62
    _
    -0.61
    [toxicity=0]
    -0.58
    -0.58
    
    -0.56
    "
    -0.55
     Kell
    -0.54
    POSITIVE LOGITS
    ſelves
    1.02
     purpoſe
    0.94
     raiſ
    0.94
     —,
    0.94
     uſ
    0.94
     ſche
    0.93
     iſt
    0.92
    ſelf
    0.92
     myſelf
    0.91
     ſtand
    0.89
    Act Density 0.611%

    No Known Activations