INDEX
    Explanations

    references to the pronoun "her" and related possessive forms

    New Auto-Interp
    Negative Logits
    eous
    -0.17
    leine
    -0.15
    Ĥæķ°
    -0.15
    hiba
    -0.15
    sse
    -0.15
     Washer
    -0.14
    abox
    -0.14
    arakter
    -0.14
    wy
    -0.14
    ful
    -0.14
    POSITIVE LOGITS
    editary
    0.27
    /her
    0.25
    esy
    0.20
    /she
    0.17
    ding
    0.16
     own
    0.16
    /us
    0.15
    itable
    0.15
    ewith
    0.15
    din
    0.15
    Act Density 0.262%

    No Known Activations