INDEX
    Explanations

    the word "He" appearing in the text

    mentions of a particular individual named "He" or a similar pronoun

    New Auto-Interp
    Negative Logits
     Thoughts
    -0.67
     Manip
    -0.65
     reality
    -0.60
     Disclosure
    -0.59
     Cumm
    -0.57
     Nichols
    -0.57
     Gems
    -0.56
     legality
    -0.56
     Appropriations
    -0.55
     anonymously
    -0.55
    POSITIVE LOGITS
    arer
    1.19
    eded
    1.17
    arers
    1.15
    lling
    1.11
    eding
    1.09
    lder
    1.01
    arth
    0.99
    ather
    0.99
    ALTH
    0.98
    pton
    0.97
    Act Density 0.096%

    No Known Activations