INDEX
Explanations
words related to laws, regulations, or guidelines
references to rules and regulations
New Auto-Interp
Negative Logits
Bridge
-0.69
Works
-0.68
ollen
-0.66
unk
-0.62
onto
-0.61
awks
-0.61
ivered
-0.60
imb
-0.60
anse
-0.60
Hopkins
-0.59
POSITIVE LOGITS
rule
4.07
Rule
2.90
rule
2.73
Rule
2.54
rules
2.17
rules
1.90
ruled
1.83
Rules
1.80
rul
1.71
Rules
1.60
Activations Density 0.009%