INDEX
    Explanations

    references to persecution or mistreatment

    New Auto-Interp
    Negative Logits
    inho
    -0.17
    oids
    -0.15
    ucle
    -0.15
    itest
    -0.15
    oid
    -0.14
    iad
    -0.14
    ural
    -0.14
    ald
    -0.14
     sophistic
    -0.14
    atura
    -0.14
    POSITIVE LOGITS
     by
    0.19
    ë°Ľ
    0.17
    dorf
    0.16
     تÙĪØ³Ø·
    0.15
     oleh
    0.15
    ress
    0.15
    undi
    0.15
    227
    0.15
     applied
    0.14
    inator
    0.14
    Act Density 0.263%

    No Known Activations