INDEX
    Explanations

    instances of the word "violation" and related terms indicating breaches or infractions of rules or laws

    New Auto-Interp
    Negative Logits
    ãĤ
    -0.15
    rolled
    -0.15
    anh
    -0.15
    ingo
    -0.15
    oha
    -0.14
    DAQ
    -0.14
    iao
    -0.14
    uro
    -0.13
    opol
    -0.13
    ildo
    -0.13
    POSITIVE LOGITS
    šek
    0.15
    ustin
    0.15
    hou
    0.14
     Buckley
    0.14
    éģĬ
    0.14
    ück
    0.13
     Vit
    0.13
    asmus
    0.13
    .scalablytyped
    0.13
    761
    0.13
    Act Density 0.006%

    No Known Activations