INDEX
Explanations
specific instances where an action is either allowed or not allowed
words related to permission and authorization
New Auto-Interp
Negative Logits
tal
-0.63
stats
-0.63
guy
-0.63
bust
-0.63
box
-0.62
pop
-0.61
beat
-0.60
Voc
-0.60
tons
-0.60
ðŁ
-0.60
POSITIVE LOGITS
permitted
3.27
permissible
2.30
prohibited
2.00
allowable
1.85
authorized
1.73
allowed
1.68
forbidden
1.61
permit
1.60
permitting
1.56
authorised
1.54
Activations Density 0.011%