INDEX
    Explanations

    Twitter usernames

    proper nouns, particularly names and usernames

    New Auto-Interp
    Negative Logits
     substitutes
    -0.74
    theless
    -0.73
     borne
    -0.70
     ACTIONS
    -0.70
    é¾įå¥ij士
    -0.68
     substituted
    -0.65
     resid
    -0.65
     cured
    -0.62
     press
    -0.61
     uncertain
    -0.60
    POSITIVE LOGITS
    uff
    0.88
    Magikarp
    0.86
    WithNo
    0.85
    Jr
    0.84
    trump
    0.83
    _.
    0.83
    Own
    0.83
    Whe
    0.82
    FT
    0.82
    td
    0.81
    Act Density 0.079%

    No Known Activations