INDEX
    Explanations

    coding-related structures and data

    New Auto-Interp
    Negative Logits
     ];↵↵
    -0.19
    }")↵↵
    -0.18
     "");↵↵
    -0.18
    >");↵↵
    -0.18
    )]↵↵
    -0.17
     ]);↵↵
    -0.17
    _;↵↵
    -0.16
     );↵↵
    -0.16
     {});↵↵
    -0.16
    };↵↵
    -0.16
    POSITIVE LOGITS
    ↵↵↵
    0.29
    ,↵↵↵
    0.27
    ()↵↵↵
    0.27
     *↵↵↵
    0.26
    "↵↵↵
    0.26
    '↵↵↵
    0.25
     {}↵↵↵
    0.25
     []↵↵↵
    0.24
    ?↵↵↵
    0.24
    /↵↵↵
    0.24
    Act Density 0.010%

    No Known Activations