INDEX
    Explanations

    Twitter usernames

    underscore characters or special formatting

    New Auto-Interp
    Negative Logits
    atform
    -0.74
     Levine
    -0.73
     Holding
    -0.69
     Frazier
    -0.69
     Pearce
    -0.69
     FML
    -0.68
     Manson
    -0.68
     Ago
    -0.68
    quished
    -0.68
     Stef
    -0.67
    POSITIVE LOGITS
    default
    1.06
    dict
    1.02
    chance
    1.00
    blank
    0.99
    EStreamFrame
    0.97
    tro
    0.95
    events
    0.94
    gradient
    0.94
    token
    0.92
    delay
    0.92
    Act Density 0.024%

    No Known Activations