INDEX
Explanations
references to financial penalties or consequences
references to fines and penalties
New Auto-Interp
Negative Logits
oir
-0.71
groups
-0.70
ipel
-0.69
Ops
-0.69
avez
-0.67
population
-0.66
omorph
-0.65
ITNESS
-0.65
owers
-0.64
abad
-0.63
POSITIVE LOGITS
fines
1.00
levied
0.88
confir
0.84
fined
0.83
viol
0.81
alties
0.79
violations
0.72
ensing
0.71
redes
0.70
Bunny
0.70
Activations Density 0.008%