INDEX
    Explanations

    human rights and privacy

    New Auto-Interp
    Negative Logits
    '
    0.66
    Wit
    0.43
    r
    0.43
     εν
    0.42
     unsupervised
    0.41
    Trinity
    0.40
    输入
    0.40
    iciencia
    0.39
    目的地
    0.39
    l
    0.39
    POSITIVE LOGITS
     violations
    0.77
    violations
    0.72
     violation
    0.70
     Violation
    0.68
     حقوق
    0.65
     अधिकारों
    0.64
    rights
    0.61
     direitos
    0.59
     उल्लंघन
    0.59
     violated
    0.57
    Act Density 0.035%

    No Known Activations