INDEX
    Explanations

    comments and annotations in code

    New Auto-Interp
    Negative Logits
     (
    -0.16
    is
    -0.15
    -
    -0.15
     Bun
    -0.15
    olo
    -0.15
     racks
    -0.15
    ya
    -0.14
     Pent
    -0.14
    .
    -0.14
    aks
    -0.14
    POSITIVE LOGITS
    ~-~-~-~-
    0.20
    Č↵
    0.20
    eof
    0.19
    -BEGIN
    0.17
    BOOLE
    0.17
    UTILITY
    0.16
     convenience
    0.16
    \/\/
    0.16
     helpers
    0.15
     constants
    0.15
    Act Density 0.070%

    No Known Activations