INDEX
Explanations
instances of attribution or authorship related to the content
New Auto-Interp
Negative Logits
urdy
-0.16
allen
-0.15
alen
-0.14
18
-0.14
jen
-0.13
landa
-0.13
ì°½
-0.13
Padding
-0.13
arkin
-0.13
EMA
-0.13
POSITIVE LOGITS
admin
0.32
Admin
0.27
Admin
0.26
admin
0.26
_admin
0.24
administrator
0.24
_ADMIN
0.24
Administrator
0.23
ADMIN
0.22
admins
0.22
Activations Density 0.029%