INDEX
    Explanations

    instances of the pronoun "I" and related self-referential expressions

    New Auto-Interp
    Negative Logits
    iae
    -0.15
    acon
    -0.15
     preview
    -0.14
    itom
    -0.14
    orough
    -0.14
    uide
    -0.13
    117
    -0.13
    커
    -0.13
    iaz
    -0.13
    opi
    -0.13
    POSITIVE LOGITS
     suspect
    0.33
     sur
    0.31
     infer
    0.30
     suspects
    0.28
     assume
    0.27
     wonder
    0.27
     assumption
    0.27
     inference
    0.26
     assumes
    0.26
     ded
    0.25
    Act Density 0.173%

    No Known Activations