INDEX
    Explanations

    pronouns and verbs related to actions

    instances of the pronoun "it" and similar terms indicating subjects or topics in context

    New Auto-Interp
    Negative Logits
    Gender
    -0.77
    orse
    -0.69
    hart
    -0.68
    mma
    -0.66
    pat
    -0.65
    priv
    -0.64
    tnc
    -0.64
     Aid
    -0.64
    aca
    -0.63
    grass
    -0.62
    POSITIVE LOGITS
     nonetheless
    1.38
     nevertheless
    1.34
     persisted
    0.99
     fortunately
    0.97
     still
    0.89
    theless
    0.89
     certainly
    0.89
    'll
    0.89
     couldn
    0.87
     cannot
    0.87
    Act Density 0.359%

    No Known Activations