INDEX
    Explanations

    pronouns or nouns referring to people

    references to people and their actions or experiences

    New Auto-Interp
    Negative Logits
     Eleven
    -0.75
     Around
    -0.67
    wikipedia
    -0.66
    amaz
    -0.65
    ogue
    -0.64
     Greatest
    -0.64
     Deg
    -0.63
     Dayton
    -0.62
    icion
    -0.62
    aign
    -0.61
    POSITIVE LOGITS
     knew
    1.00
     lacked
    1.00
     didn
    0.98
    've
    0.95
     feared
    0.93
     forgot
    0.93
     got
    0.91
     couldn
    0.89
     didnt
    0.89
     hadn
    0.89
    Act Density 0.137%

    No Known Activations