INDEX
    Explanations

    references to specific scientific terms or variables in mathematical contexts

    New Auto-Interp
    Negative Logits
    ↵↵
    -1.12
    -1.09
    -1.08
    ,
    -1.04
    1
    -0.96
    2
    -0.94
    .
    -0.93
     (
    -0.89
    /
    -0.88
      
    -0.88
    POSITIVE LOGITS
    <unused43>
    1.80
    <unused41>
    1.80
    <pad>
    1.79
    <unused3>
    1.79
    <unused74>
    1.79
    <unused79>
    1.79
    <unused42>
    1.79
    <unused28>
    1.79
    <unused8>
    1.79
    <unused14>
    1.79
    Act Density 0.018%

    No Known Activations