INDEX
    Explanations

    phrases related to statements and opinions made by a specific individual

    instances of the pronoun "he" referring to a male subject

    New Auto-Interp
    Negative Logits
    noon
    -0.80
    etheless
    -0.69
     cannabin
    -0.67
    cious
    -0.66
    BALL
    -0.64
     interfering
    -0.64
    berra
    -0.63
    rocket
    -0.63
    Operation
    -0.62
    visible
    -0.62
    POSITIVE LOGITS
     said
    1.19
     wrote
    1.11
     joked
    1.04
     says
    1.03
     tweeted
    1.01
    said
    0.99
     told
    0.96
    'd
    0.96
     explained
    0.91
     added
    0.91
    Act Density 0.056%

    No Known Activations