INDEX
    Explanations

    mentions of a particular female character

    mentions of the pronoun 'her'

    New Auto-Interp
    Negative Logits
    ype
    -0.93
    ypes
    -0.90
    kefeller
    -0.90
    ornia
    -0.89
    arios
    -0.88
    VERTIS
    -0.76
    brush
    -0.70
    eers
    -0.70
    ollo
    -0.70
    ablishment
    -0.69
    POSITIVE LOGITS
    ding
    1.26
     own
    1.26
     daughter
    1.13
    metic
    1.12
     husband
    1.12
    cule
    1.09
     granddaughter
    1.06
    nia
    1.04
    ded
    1.02
     Majesty
    1.02
    Act Density 0.113%

    No Known Activations