INDEX
    Explanations

    names and user profiles

    presence of the end-of-text token

    New Auto-Interp
    Negative Logits
     Instr
    -0.82
     Seym
    -0.66
     Canary
    -0.65
     Pound
    -0.63
     Niet
    -0.62
     Channel
    -0.62
     Ninth
    -0.60
     Moroc
    -0.60
     [*
    -0.60
     Kik
    -0.60
    POSITIVE LOGITS
    ":{"
    0.77
    @
    0.73
    lement
    0.72
    _
    0.68
     Profile
    0.66
    mosp
    0.66
    llular
    0.66
    roid
    0.65
    podcast
    0.63
    ocious
    0.63
    Act Density 0.182%

    No Known Activations