INDEX
    Explanations

    pronouns followed by verbs

    pronouns, particularly the word "he" and "she."

    New Auto-Interp
    Negative Logits
    noon
    -0.85
    rocket
    -0.69
    anking
    -0.64
    iries
    -0.64
    uits
    -0.61
    earch
    -0.60
     intervening
    -0.60
    NAT
    -0.59
    reach
    -0.58
    Role
    -0.58
    POSITIVE LOGITS
     said
    1.00
     wrote
    0.97
    'd
    0.95
     joked
    0.93
    said
    0.89
     tweeted
    0.89
     says
    0.87
     laughed
    0.86
    aeus
    0.85
     exclaimed
    0.85
    Act Density 0.054%

    No Known Activations