INDEX
Explanations
discussions around regulatory restrictions and their impact on creativity and human rights
New Auto-Interp
Negative Logits
ãģĹãĤĩ
-0.17
æĴ
-0.17
ego
-0.15
pard
-0.14
.Atomic
-0.14
freeing
-0.13
Tort
-0.13
रत
-0.13
859
-0.13
freed
-0.13
POSITIVE LOGITS
restriction
0.30
ban
0.30
restrictions
0.29
ban
0.28
restrict
0.25
éĻIJåζ
0.25
restrict
0.24
bans
0.24
limits
0.23
_restrict
0.23
Activations Density 0.380%