INDEX
Explanations
references to laws and their enforcement status
New Auto-Interp
Negative Logits
-0.18
prohibited
-0.17
forbidden
-0.15
(
-0.15
barred
-0.15
Valle
-0.15
Lazy
-0.15
categor
-0.15
eken
-0.15
banned
-0.14
POSITIVE LOGITS
enforce
0.28
applied
0.27
violated
0.26
binding
0.26
enforced
0.24
aplic
0.23
binding
0.23
Binding
0.23
Binding
0.22
Enforcement
0.22
Activations Density 0.236%