INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ogun
    -0.65
    heddar
    -0.63
    Roger
    -0.62
    idated
    -0.61
    solete
    -0.60
     prostate
    -0.60
    Jimmy
    -0.60
    querque
    -0.60
    emetery
    -0.59
    jandro
    -0.59
    POSITIVE LOGITS
     herself
    1.77
     Devi
    1.08
     she
    0.95
    eva
    0.91
     miscar
    0.89
     gigg
    0.86
    she
    0.86
     her
    0.85
     hijab
    0.84
     heroine
    0.84
    Act Density 0.248%

    No Known Activations