INDEX
Explanations
general rules, procedures, or standards that are being applied or enforced
statements regarding rules and their applicability
New Auto-Interp
Negative Logits
omes
-0.70
osal
-0.69
otide
-0.67
OGR
-0.65
ãĥīãĥ©
-0.65
asted
-0.62
unks
-0.62
ortment
-0.62
edes
-0.61
cipl
-0.61
POSITIVE LOGITS
applies
0.85
apply
0.80
GOODMAN
0.79
exclusively
0.77
sparing
0.77
uniformly
0.75
ourgeois
0.72
unfairly
0.72
alties
0.72
solely
0.72
Activations Density 0.043%