INDEX
    Explanations

    phrases related to female individuals

    references to a specific individual, focusing on their actions and attributes

    New Auto-Interp
    Negative Logits
    ype
    -0.91
    ypes
    -0.91
    kefeller
    -0.87
    ornia
    -0.86
    inctions
    -0.70
    brush
    -0.69
    arios
    -0.69
    VERTIS
    -0.67
    ENSE
    -0.66
    ozy
    -0.66
    POSITIVE LOGITS
     husband
    1.35
    ding
    1.34
    metic
    1.23
    cule
    1.19
     herself
    1.16
     daughter
    1.08
    ded
    1.07
    nia
    1.05
     boyfriend
    1.03
     Majesty
    1.01
    Act Density 0.118%

    No Known Activations