INDEX
    Explanations

    proper nouns specifically people's names

    references to specific names and proper nouns

    New Auto-Interp
    Negative Logits
    inately
    -0.91
    è¦ļéĨĴ
    -0.73
     Canary
    -0.67
     DISTR
    -0.66
     prevailing
    -0.65
     seeker
    -0.64
    é¾įå¥ij士
    -0.64
     compr
    -0.63
    ¥µ
    -0.62
     blanket
    -0.61
    POSITIVE LOGITS
    ny
    1.19
     Diesel
    1.09
    eland
    0.97
    ita
    0.96
    ned
    0.94
    lass
    0.93
    iti
    0.91
    ificial
    0.89
    omial
    0.88
    ners
    0.87
    Act Density 0.030%

    No Known Activations