INDEX
Explanations
mentions of policies and guidelines
references to policies, particularly in formal contexts
New Auto-Interp
Negative Logits
ITNESS
-0.80
nces
-0.78
ingly
-0.75
Flavoring
-0.72
ymes
-0.71
Stain
-0.70
Ago
-0.69
burn
-0.69
semble
-0.67
htaking
-0.66
POSITIVE LOGITS
prohibiting
0.93
enforcement
0.92
holders
0.91
makers
0.90
Enforcement
0.87
restricting
0.81
holder
0.81
abiding
0.80
making
0.78
enforcement
0.77
Activations Density 0.032%