INDEX
    Explanations

    names of individuals or entities

    proper nouns, particularly names of people

    New Auto-Interp
    Negative Logits
    addons
    -0.74
     respectively
    -0.73
    izoph
    -0.65
    thora
    -0.63
    Tokens
    -0.63
    notation
    -0.62
     overlap
    -0.60
     overwhelming
    -0.60
    depending
    -0.59
    20439
    -0.58
    POSITIVE LOGITS
     remembers
    1.07
     celebrates
    1.04
     writes
    1.01
     teaches
    0.98
     poses
    0.97
     joins
    0.97
     Profile
    0.96
     speaks
    0.94
     greets
    0.93
     wears
    0.92
    Act Density 0.233%

    No Known Activations