INDEX
Explanations
mentions of rules or regulations prohibiting certain actions
prohibitions and permissions regarding actions and behaviors
New Auto-Interp
Negative Logits
lves
-0.72
xon
-0.69
office
-0.67
Soldier
-0.62
power
-0.59
tal
-0.59
posure
-0.59
leaf
-0.59
elect
-0.58
lish
-0.58
POSITIVE LOGITS
Reviewer
1.11
uthor
0.86
exemptions
0.83
ptin
0.75
pedia
0.72
allowed
0.70
disclaim
0.70
deviations
0.69
aston
0.69
permitted
0.68
Activations Density 0.026%