INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     jente
    -0.07
     violent
    -0.07
    	open
    -0.07
    ايات
    -0.07
    958
    -0.06
    -0.06
    ###############################################################################↵
    -0.06
     wives
    -0.06
     AXIS
    -0.06
    νας
    -0.06
    POSITIVE LOGITS
     discrimination
    0.13
    iscrimination
    0.09
     discrim
    0.08
     Discrim
    0.08
     discriminatory
    0.08
     discriminate
    0.08
     discard
    0.07
     discretion
    0.07
     disclosure
    0.07
    uld
    0.07
    Act Density 0.006%

    No Known Activations