INDEX
Explanations
references to the authorship or credit of works and content
New Auto-Interp
Negative Logits
aby
-0.16
ampa
-0.15
arks
-0.14
uard
-0.14
bound
-0.14
actor
-0.14
év
-0.14
XC
-0.13
lect
-0.13
ogle
-0.13
POSITIVE LOGITS
admin
0.38
Admin
0.31
Admin
0.31
Administrator
0.30
admin
0.30
administrator
0.28
_admin
0.28
Administrator
0.27
_ADMIN
0.27
ADMIN
0.26
Activations Density 0.029%