INDEX
    Explanations

    references to a specific female character

    New Auto-Interp
    Negative Logits
    aroo
    -0.20
    heads
    -0.15
    uzzi
    -0.15
     itself
    -0.15
    ائر
    -0.15
    ाà¤Ĺत
    -0.14
     yourselves
    -0.14
     pedig
    -0.14
    jac
    -0.13
    issance
    -0.13
    POSITIVE LOGITS
     own
    0.37
    /us
    0.35
    editary
    0.34
    /her
    0.30
    esy
    0.29
    ding
    0.26
    etical
    0.24
    mits
    0.24
    etics
    0.23
    SELF
    0.23
    Act Density 0.063%

    No Known Activations