INDEX
    Explanations

    references to girls and women in various contexts

    New Auto-Interp
    Negative Logits
    elay
    -0.17
    vig
    -0.16
    esus
    -0.16
    ibold
    -0.16
    byss
    -0.16
    forth
    -0.16
    abis
    -0.15
    amsung
    -0.15
    kdir
    -0.15
    bsolute
    -0.15
    POSITIVE LOGITS
    hood
    0.24
    -boy
    0.20
    boys
    0.17
    /y
    0.16
    /man
    0.16
     phép
    0.16
    åĢij
    0.15
    friends
    0.15
    friend
    0.15
    اÙĨÙĩ
    0.15
    Act Density 0.063%

    No Known Activations