INDEX
    Explanations

    instances of the pronoun "I" and related personal statements or reflections

    New Auto-Interp
    Negative Logits
    ibold
    -0.16
    utan
    -0.15
    apon
    -0.15
    exo
    -0.15
     Paper
    -0.15
     Ye
    -0.14
    Elapsed
    -0.13
     Remarks
    -0.13
    AGON
    -0.13
     Clay
    -0.13
    POSITIVE LOGITS
     ever
    0.20
     memory
    0.19
     hadn
    0.19
     EVER
    0.17
     correctly
    0.17
    memory
    0.16
     entend
    0.16
    plx
    0.15
     were
    0.15
    ëĿ¼ëıĦ
    0.15
    Act Density 0.033%

    No Known Activations