INDEX
    Explanations

    references to male characters and their actions or states

    New Auto-Interp
    Negative Logits
    .dk
    -0.16
    obl
    -0.15
    еÑĢп
    -0.15
    oker
    -0.15
    BO
    -0.15
    оба
    -0.15
    exion
    -0.14
    ackers
    -0.14
    ful
    -0.14
    logg
    -0.14
    POSITIVE LOGITS
    /she
    0.21
    /her
    0.18
    idi
    0.17
     rip
    0.17
    kul
    0.17
    ady
    0.16
    idelberg
    0.16
     Kah
    0.15
     [
    0.15
     Majesty
    0.15
    Act Density 0.361%

    No Known Activations