INDEX
    Explanations

    mentions of human rights violations and their consequences

    New Auto-Interp
    Negative Logits
    .addHandler
    -0.16
    ekl
    -0.15
    ogg
    -0.15
    ánh
    -0.14
    oci
    -0.14
    ÑĨин
    -0.13
     Hoch
    -0.13
     непÑĢиÑıÑĤ
    -0.13
    obra
    -0.13
    اÙĨÙĬ
    -0.13
    POSITIVE LOGITS
     arbitrary
    0.34
     extr
    0.31
     summary
    0.31
     Arbitrary
    0.28
     disappear
    0.28
     torture
    0.27
     Extr
    0.26
     extra
    0.26
    summary
    0.25
     Summary
    0.24
    Act Density 0.040%

    No Known Activations