INDEX
    Explanations

    instances of rule-breaking or violations

    New Auto-Interp
    Negative Logits
    00200000
    -0.71
    vice
    -0.69
    ãĥ¼ãĥĨãĤ£
    -0.66
    tesque
    -0.65
    hate
    -0.65
    Hell
    -0.64
    Bank
    -0.64
    mega
    -0.64
    question
    -0.63
     Investor
    -0.62
    POSITIVE LOGITS
     curfew
    0.74
     fins
    0.72
     vaccinations
    0.71
     performance
    0.71
     liberties
    0.70
     tranqu
    0.70
     transmissions
    0.69
     immersion
    0.69
     vaccination
    0.69
    orius
    0.68
    Act Density 0.416%

    No Known Activations