INDEX
    Explanations

    Twitter usernames

    underscore characters in usernames or handles

    New Auto-Interp
    Negative Logits
     screenings
    -0.70
    eteria
    -0.69
     Manson
    -0.68
    pload
    -0.67
     ric
    -0.67
     aud
    -0.66
     Turing
    -0.66
     repent
    -0.64
     Casey
    -0.63
     chlorine
    -0.63
    POSITIVE LOGITS
    ebook
    1.23
    chance
    1.03
    tro
    0.98
    dust
    0.97
    must
    0.93
    vs
    0.92
    blank
    0.91
    pill
    0.91
    main
    0.90
    dict
    0.89
    Act Density 0.020%

    No Known Activations