INDEX
Explanations
phrases related to violations or infractions
references to legal violations or breaches of rights
New Auto-Interp
Negative Logits
acs
-0.76
onna
-0.73
uilding
-0.69
esses
-0.67
hin
-0.66
ppa
-0.66
merry
-0.66
reau
-0.65
gre
-0.64
igrants
-0.63
POSITIVE LOGITS
norms
1.11
protocol
0.85
norm
0.85
rights
0.84
confidentiality
0.84
jurisdiction
0.82
Ö¼
0.82
principles
0.81
principle
0.80
ĨĴ
0.80
Activations Density 0.194%