INDEX
    Explanations

    mathematical notation and variables

    New Auto-Interp
    Negative Logits
    0.76
    0.74
    )+"
    0.74
    0.71
    0.70
    0.70
    )+'
    0.70
    0.68
    0.68
    0.67
    POSITIVE LOGITS
     \
    1.77
     (\
    1.45
     \,
    1.30
    ^{\
    1.28
     [\
    1.20
    $,
    1.20
    (\
    1.19
    -\
    1.18
     \;
    1.15
     {\
    1.15
    Act Density 0.276%

    No Known Activations