INDEX
Explanations
references to legal violations
New Auto-Interp
Negative Logits
mad
-0.87
arger
-0.77
rich
-0.73
anka
-0.72
abs
-0.71
rose
-0.69
opped
-0.69
azor
-0.67
iris
-0.66
venture
-0.65
POSITIVE LOGITS
laws
0.89
Laws
0.85
statutes
0.82
confidentiality
0.81
violations
0.80
norms
0.80
curfew
0.76
unfocusedRange
0.75
laws
0.75
antitrust
0.74
Activations Density 0.019%