INDEX
    Explanations

    mentions of a particular female individual

    repeated references to the pronoun "her."

    New Auto-Interp
    Negative Logits
    Process
    -0.57
    rahim
    -0.56
    ivo
    -0.55
    gaard
    -0.54
    DX
    -0.54
    otation
    -0.54
    rencies
    -0.54
    lite
    -0.54
    çīĪ
    -0.53
    agos
    -0.53
    POSITIVE LOGITS
     her
    3.09
     hers
    2.57
     herself
    2.50
    Her
    2.14
     she
    2.02
    She
    1.94
     HER
    1.90
     Her
    1.81
    she
    1.76
     She
    1.74
    Act Density 0.097%

    No Known Activations