INDEX
Explanations
phrases related to rules, policies, and standards
New Auto-Interp
Negative Logits
issance
-0.91
ilus
-0.84
fortune
-0.80
-0.73
jiang
-0.72
joy
-0.71
ience
-0.70
asca
-0.69
ortun
-0.67
semble
-0.65
POSITIVE LOGITS
imposed
1.18
governing
1.09
enforced
1.09
dictates
1.04
guidelines
1.02
criteria
1.01
requirements
1.01
stip
1.00
rules
0.98
deviation
0.96
Activations Density 1.307%