INDEX
Explanations
instances of the word "violation" and related terms indicating breaches or infractions of rules or laws
New Auto-Interp
Negative Logits
ãĤ
-0.15
rolled
-0.15
anh
-0.15
ingo
-0.15
oha
-0.14
DAQ
-0.14
iao
-0.14
uro
-0.13
opol
-0.13
ildo
-0.13
POSITIVE LOGITS
šek
0.15
ustin
0.15
hou
0.14
Buckley
0.14
éģĬ
0.14
ück
0.13
Vit
0.13
asmus
0.13
.scalablytyped
0.13
761
0.13
Activations Density 0.006%