INDEX
    Explanations

    repeated references to a singular female subject

    New Auto-Interp
    Negative Logits
    resse
    -0.18
    eum
    -0.16
    leneck
    -0.15
    ouis
    -0.15
    lass
    -0.15
    gger
    -0.15
    cone
    -0.15
    osite
    -0.15
    ivia
    -0.15
    cliffe
    -0.15
    POSITIVE LOGITS
    /us
    0.28
    /her
    0.23
    editary
    0.23
    ding
    0.22
    zelf
    0.19
    /th
    0.18
    ewith
    0.18
    ded
    0.17
    etical
    0.17
    esy
    0.17
    Act Density 0.121%

    No Known Activations