INDEX
    Explanations

    Code/data formatting

    New Auto-Interp
    Negative Logits
    -addons
    -0.06
     (;
    -0.06
    COMMAND
    -0.06
    312
    -0.06
     chairs
    -0.06
    ~~
    -0.06
    /world
    -0.06
     bots
    -0.06
     cuffs
    -0.06
    (control
    -0.06
    POSITIVE LOGITS
     Injury
    0.07
    Pawn
    0.07
     Onc
    0.06
    _lin
    0.06
    ipsis
    0.06
    .friend
    0.06
     Interracial
    0.06
     Jerseys
    0.06
    
    0.06
    stdbool
    0.06
    Act Density 0.004%

    No Known Activations