INDEX
    Explanations

    names of famous individuals

    proper nouns, specifically names of individuals

    New Auto-Interp
    Negative Logits
    intendent
    -0.74
    Reviewer
    -0.70
    Region
    -0.69
    DCS
    -0.69
     Dhabi
    -0.67
    theless
    -0.67
    tein
    -0.66
    dylib
    -0.65
     Purg
    -0.64
     Ow
    -0.63
    POSITIVE LOGITS
    ravis
    0.70
     lawy
    0.68
    stad
    0.66
    enty
    0.65
    eman
    0.64
    beck
    0.63
    hler
    0.63
    acher
    0.62
    assian
    0.61
    itton
    0.61
    Act Density 0.181%

    No Known Activations