INDEX
    Explanations

    words related to societal hierarchy, shame, and powerlessness

    New Auto-Interp
    Negative Logits
    <bos>
    -0.65
     فريبيس
    -0.63
    gonic
    -0.59
     hereof
    -0.57
    ldorf
    -0.56
    }.
    
    -0.56
    AndroidJUnit
    -0.56
     NUKAT
    -0.55
    ($__
    -0.54
    rifugal
    -0.54
    POSITIVE LOGITS
    ########.
    0.66
    月号
    0.50
     senaste
    0.46
    0.46
     pacchetto
    0.45
    RegressionTest
    0.45
    featureID
    0.44
     instalar
    0.44
    enskap
    0.43
     importanza
    0.43
    Act Density 0.245%

    No Known Activations