INDEX
Explanations
regulations or prohibitions related to specific actions or groups
references to prohibitions and bans
New Auto-Interp
Negative Logits
patch
-0.77
hematic
-0.70
partName
-0.70
mirac
-0.67
cule
-0.66
figure
-0.66
framework
-0.65
stack
-0.64
ustration
-0.64
Lind
-0.64
POSITIVE LOGITS
exceptions
0.92
permitted
0.89
permissible
0.84
exemptions
0.83
arate
0.82
allowable
0.81
minors
0.81
prohibited
0.80
unrestricted
0.76
exempt
0.74
Activations Density 0.705%