INDEX
    Explanations

    mentions of the pronoun 'her' in various contexts

    New Auto-Interp
    Negative Logits
    s
    -0.18
    sw
    -0.17
    eum
    -0.16
    rans
    -0.15
    lass
    -0.15
    e
    -0.15
    (-
    -0.15
    swap
    -0.15
    eg
    -0.15
    ymoon
    -0.14
    POSITIVE LOGITS
    editary
    0.29
    /us
    0.26
    /her
    0.25
     own
    0.24
    ding
    0.23
    zelf
    0.21
    esy
    0.20
    ewith
    0.19
    SELF
    0.19
    -même
    0.19
    Act Density 0.132%

    No Known Activations