INDEX
Explanations
terms associated with legal or moral violations
New Auto-Interp
Negative Logits
awtextra
-0.78
informatics
-0.71
ISNI
-0.69
worauf
-0.68
Архівовано
-0.64
proxy
-0.61
golf
-0.60
Resili
-0.60
Gus
-0.58
сре
-0.58
POSITIVE LOGITS
violation
1.32
violations
1.29
violate
1.24
violating
1.23
violated
1.21
Violation
1.18
violation
1.16
Violations
1.16
violates
1.12
Viol
1.11
Activations Density 0.075%