INDEX
    Explanations

    instances of violence, breaches, and notable figures in reports and discussions

    New Auto-Interp
    Negative Logits
    arsing
    -0.17
    .labelX
    -0.16
    辦
    -0.15
    IQUE
    -0.14
    или
    -0.14
    chers
    -0.14
     Birch
    -0.14
    keiten
    -0.14
    egen
    -0.14
    εÏį
    -0.13
    POSITIVE LOGITS
     being
    0.29
    being
    0.25
     Being
    0.20
    Being
    0.19
     sendo
    0.18
     essere
    0.16
    _basename
    0.15
     therein
    0.15
    owell
    0.14
     být
    0.14
    Act Density 0.101%

    No Known Activations