INDEX
    Explanations

    names of people or entities

    proper nouns, specifically names

    New Auto-Interp
    Negative Logits
    ORTS
    -0.71
    FACE
    -0.70
    ENTS
    -0.67
    BILITY
    -0.67
    ornia
    -0.65
    pection
    -0.63
    perature
    -0.63
     arenas
    -0.63
    士
    -0.62
    sylv
    -0.62
    POSITIVE LOGITS
    jad
    0.99
    henko
    0.74
    atz
    0.74
     Nak
    0.71
    ndra
    0.71
    kov
    0.68
    opoulos
    0.68
    ullah
    0.67
    uty
    0.66
    aleb
    0.66
    Act Density 0.173%

    No Known Activations