INDEX
    Explanations

    negative perceptions

    New Auto-Interp
    Negative Logits
     genoc
    -0.08
     corrupción
    -0.08
     gdje
    -0.08
     schrijf
    -0.08
    eli
    -0.08
    டுத்த
    -0.07
     corrupção
    -0.07
     wydar
    -0.07
     ekstrem
    -0.07
     dónde
    -0.07
    POSITIVE LOGITS
    aspect
    0.09
     aspect
    0.08
    0.08
    Aspect
    0.08
    ární
    0.08
     rethink
    0.08
    -grade
    0.08
    erman
    0.08
     upt
    0.08
    ertje
    0.07
    Act Density 0.093%

    No Known Activations