INDEX
Explanations
words related to rules, regulations, and limitations
phrases related to restrictions and limitations on freedoms
New Auto-Interp
Negative Logits
nexus
-0.70
dig
-0.70
00200000
-0.68
issance
-0.68
asus
-0.66
tein
-0.66
uscript
-0.65
apest
-0.65
Zeit
-0.63
char
-0.63
POSITIVE LOGITS
imposed
1.52
restricting
1.13
placed
1.12
levied
1.12
prohibiting
1.11
enforced
1.07
preventing
1.00
inhib
0.97
exerted
0.96
lifted
0.94
Activations Density 0.138%