INDEX
    Explanations

    oppression, discrimination, exploitation

    New Auto-Interp
    Negative Logits
    ية
    0.57
     ativos
    0.52
    После
    0.52
    Q
    0.50
    O
    0.49
    0.49
     árvore
    0.48
    D
    0.48
    N
    0.48
     after
    0.47
    POSITIVE LOGITS
     oppression
    0.70
     oppressed
    0.61
     coercive
    0.60
     harassment
    0.58
     restrictive
    0.56
     abuse
    0.56
     discrimination
    0.56
     misuse
    0.55
     discriminatory
    0.55
     oppressive
    0.54
    Act Density 0.443%

    No Known Activations