INDEX
    Explanations

    references to societal issues and human rights violations

    New Auto-Interp
    Negative Logits
    qi
    -0.16
     اÙĦتس
    -0.15
    dez
    -0.14
     orb
    -0.14
    agraph
    -0.14
    quat
    -0.14
     Cres
    -0.14
    stra
    -0.14
    ово
    -0.14
    ntl
    -0.14
    POSITIVE LOGITS
     poor
    0.19
     towards
    0.17
    omanip
    0.17
     others
    0.16
    äch
    0.16
     Others
    0.15
     Gle
    0.15
    /animate
    0.15
     collateral
    0.15
     women
    0.14
    Act Density 0.217%

    No Known Activations