INDEX
Explanations
instances of authorship or attributions in a text
New Auto-Interp
Negative Logits
op
-0.17
ientes
-0.16
ap
-0.16
lands
-0.15
ir
-0.14
am
-0.14
fame
-0.14
Lands
-0.14
avar
-0.14
:#
-0.13
POSITIVE LOGITS
admin
0.16
ække
0.16
edm
0.15
chwitz
0.14
staff
0.14
gni
0.14
gauss
0.14
diseñador
0.14
едак
0.14
presso
0.14
Activations Density 0.061%