INDEX
    Explanations

    proper nouns, specifically names of individuals

    mentions of specific individuals or names in the text

    New Auto-Interp
    Negative Logits
    ALLY
    -0.77
    NetMessage
    -0.73
    opausal
    -0.72
    eers
    -0.71
    netflix
    -0.70
     Disneyland
    -0.68
     Kinnikuman
    -0.64
    naissance
    -0.64
    ĺħ
    -0.63
    ISE
    -0.62
    POSITIVE LOGITS
    atche
    1.07
    rite
    0.97
    mus
    0.93
    ani
    0.87
    ans
    0.86
    aku
    0.86
    inx
    0.85
    aji
    0.85
    onso
    0.85
    arah
    0.83
    Act Density 0.030%

    No Known Activations