INDEX
    Explanations

    words related to specific names or people

    proper nouns, specifically names of people and locations

    New Auto-Interp
    Negative Logits
    iery
    -0.60
    irens
    -0.56
    mble
    -0.56
    ãĥŁ
    -0.55
     à¨
    -0.54
    ß
    -0.54
     Calais
    -0.54
     Debor
    -0.54
    notation
    -0.53
    reviewed
    -0.53
    POSITIVE LOGITS
     himself
    0.99
    's
    0.94
     realizes
    0.78
     knew
    0.75
     Himself
    0.75
     knows
    0.74
    â̲
    0.73
     herself
    0.72
     remembers
    0.70
     Sr
    0.70
    Act Density 0.284%

    No Known Activations