INDEX
Explanations
textual elements that indicate authorship or publication information
New Auto-Interp
Negative Logits
auc
-0.16
oyer
-0.14
unger
-0.14
CAC
-0.14
anno
-0.14
weg
-0.14
oleon
-0.14
uelle
-0.13
linger
-0.13
Charter
-0.13
POSITIVE LOGITS
actionTypes
0.17
696
0.16
COPE
0.16
onData
0.15
avicon
0.14
Hüs
0.14
_exc
0.14
698
0.14
atoi
0.14
agon
0.14
Activations Density 0.022%