INDEX
    Explanations

    phrases indicating serious offenses or allegations

    New Auto-Interp
    Negative Logits
     ÃŃch
    -0.16
    uv
    -0.15
    429
    -0.14
    ToProps
    -0.14
    icine
    -0.14
    ë§Ľ
    -0.14
    enheim
    -0.14
    .bam
    -0.14
    à¹Ĥล
    -0.14
     Smy
    -0.13
    POSITIVE LOGITS
     serious
    0.86
     Serious
    0.72
    serious
    0.70
     seriousness
    0.68
     seriously
    0.61
    -ser
    0.60
     grave
    0.55
     severe
    0.55
     seri
    0.53
    Ser
    0.52
    Act Density 0.070%

    No Known Activations