INDEX
    Explanations

    phrases related to a specific female individual

    references to a specific female character and her experiences

    New Auto-Interp
    Negative Logits
    arios
    -0.83
    ablishment
    -0.82
    ozy
    -0.81
    ypes
    -0.79
    ornia
    -0.77
    ±
    -0.77
    ype
    -0.76
    :/
    -0.72
    uck
    -0.70
    oft
    -0.69
    POSITIVE LOGITS
     own
    1.21
     granddaughter
    1.10
    ding
    1.09
     daughter
    1.08
    metic
    1.04
    itage
    1.00
     niece
    1.00
     daughters
    0.97
     grandchildren
    0.96
     assailant
    0.96
    Act Density 0.100%

    No Known Activations