INDEX
    Explanations

    references to violence and conflict

    New Auto-Interp
    Negative Logits
    è´¨
    -0.16
    ils
    -0.15
     éĹ
    -0.15
    ográf
    -0.15
    modo
    -0.14
    onn
    -0.14
    uge
    -0.14
    ODE
    -0.14
    church
    -0.14
    ÑĪев
    -0.14
    POSITIVE LOGITS
     by
    0.29
     bợi
    0.20
     oleh
    0.20
    _by
    0.17
    ANI
    0.15
    by
    0.15
    /lic
    0.14
     تÙĪØ³Ø·
    0.14
     pelos
    0.14
    nak
    0.14
    Act Density 0.177%

    No Known Activations