INDEX
    Explanations

    the speaker's first-person self-references (instances of "I" and its conjugated/contracted forms).

    New Auto-Interp
    Negative Logits
    32
    -0.08
    660
    -0.07
    66
    -0.07
     show
    -0.07
    494
    -0.07
    -out
    -0.07
    _net
    -0.07
    00
    -0.07
     over
    -0.07
     around
    -0.07
    POSITIVE LOGITS
     I
    0.26
    I
    0.19
    "I
    0.16
     i
    0.15
    ,I
    0.15
    “I
    0.14
    —I
    0.14
    -I
    0.14
    .I
    0.14
    (I
    0.14
    Act Density 0.516%

    No Known Activations