INDEX
    Explanations

    phrases or words related to interactions with or actions towards a specific female character

    references to a specific female subject

    New Auto-Interp
    Negative Logits
    ype
    -1.00
    ypes
    -0.93
    Ĥª
    -0.77
    ornia
    -0.77
    arios
    -0.76
    hovah
    -0.75
    kefeller
    -0.74
    anamo
    -0.74
    ffic
    -0.73
    ormons
    -0.72
    POSITIVE LOGITS
     husband
    1.41
     daughter
    1.35
     granddaughter
    1.25
     niece
    1.23
     boyfriend
    1.19
     sister
    1.16
     breasts
    1.12
     daughters
    1.12
     purse
    1.08
     grandson
    1.08
    Act Density 0.098%

    No Known Activations