INDEX
    Explanations

    pronouns and modal verbs indicating possibilities or actions

    pronouns and references to collective actions or experiences

    New Auto-Interp
    Negative Logits
    bats
    -0.66
    xit
    -0.64
    hend
    -0.62
    oner
    -0.59
    oling
    -0.58
    oward
    -0.57
     honorable
    -0.57
    idge
    -0.57
    prising
    -0.56
     Whatever
    -0.56
    POSITIVE LOGITS
     already
    1.12
     rarely
    1.08
     seldom
    0.99
     hadn
    0.98
     lacks
    0.95
     tends
    0.93
     cannot
    0.93
     hasn
    0.92
     never
    0.91
     lacked
    0.91
    Act Density 0.467%

    No Known Activations