INDEX
    Explanations

    mentions of prestigious titles or awards

    proper nouns, particularly names and titles

    New Auto-Interp
    Negative Logits
    enegger
    -1.11
    schild
    -0.83
    icago
    -0.81
     bracelet
    -0.73
    lihood
    -0.71
     Wink
    -0.69
     Beckham
    -0.66
    ded
    -0.65
     dred
    -0.64
    ORGE
    -0.63
    POSITIVE LOGITS
    ests
    1.15
    zes
    1.07
    ety
    1.01
    esses
    0.98
    eties
    0.89
    heed
    0.87
    quet
    0.87
    vy
    0.86
    ific
    0.86
    ë
    0.86
    Act Density 0.009%

    No Known Activations