INDEX
    Explanations

    mathematical expressions and symbols, particularly those formatted in parentheses

    New Auto-Interp
    Negative Logits
    }]);
    -0.77
    ")))
    -0.73
     iſt
    -0.71
    '));
    
    -0.69
    "]));
    -0.68
    ]))
    
    -0.67
    "]))
    -0.67
    ")));
    
    -0.65
     Beſ
    -0.65
    ')));
    -0.65
    POSITIVE LOGITS
     (
    1.46
    (\
    1.45
    >(</
    1.42
    ">(</
    1.40
     }^{(
    1.38
    (
    1.36
    1.35
    __(
    1.32
    -(
    1.29
    {(
    1.28
    Act Density 1.403%

    No Known Activations