INDEX
Explanations
words related to legal violations
references to legal violations and infringements
New Auto-Interp
Negative Logits
Tycoon
-0.83
arger
-0.81
urai
-0.68
mad
-0.68
roth
-0.67
apult
-0.66
bearded
-0.65
wow
-0.64
anka
-0.63
consolation
-0.63
POSITIVE LOGITS
violations
0.96
viol
0.86
violation
0.80
Viol
0.71
mson
0.70
Behavior
0.70
unfocusedRange
0.68
punishable
0.67
behaviors
0.67
infring
0.67
Activations Density 0.035%