INDEX
Explanations
instances of authorship or speaking attribution in text
New Auto-Interp
Negative Logits
omor
-0.15
ntl
-0.15
ç¨
-0.15
Dek
-0.15
oso
-0.15
uito
-0.15
ownt
-0.15
avou
-0.14
alis
-0.14
bum
-0.14
POSITIVE LOGITS
endency
0.17
andles
0.16
rello
0.16
cox
0.15
tongue
0.15
.psi
0.15
ires
0.15
ugins
0.15
allery
0.14
-toggler
0.14
Activations Density 0.011%