INDEX
    Explanations

    social issues related to inequality and gender biases

    New Auto-Interp
    Negative Logits
    hyde
    -0.73
    eters
    -0.72
    oths
    -0.70
    ategory
    -0.69
    ĸļ
    -0.67
    ptin
    -0.66
    aryn
    -0.66
    eteria
    -0.63
     Canaver
    -0.62
    leted
    -0.62
    POSITIVE LOGITS
    enough
    1.22
    bye
    1.15
    luck
    1.02
     luck
    1.01
     intentions
    0.97
    reads
    0.94
     enough
    0.89
    nat
    0.87
     Samar
    0.86
    sell
    0.86
    Act Density 3.623%

    No Known Activations