INDEX
    Explanations

    references to boys or male characters across various contexts

    New Auto-Interp
    Negative Logits
    lesh
    -0.16
    mente
    -0.15
    conti
    -0.14
    elay
    -0.14
    abbo
    -0.14
    lej
    -0.14
    нина
    -0.14
    ikan
    -0.14
    lements
    -0.14
     ect
    -0.13
    POSITIVE LOGITS
    friend
    0.17
    اÙĨÙĩ
    0.17
    enda
    0.16
    insky
    0.16
    friends
    0.15
    hood
    0.15
    _configure
    0.15
    ÙĪÙĩ
    0.15
    avier
    0.14
    essler
    0.14
    Act Density 0.026%

    No Known Activations