INDEX
Explanations
quotation marks or dialogue indicators in the text
New Auto-Interp
Negative Logits
s
-0.57
I
-0.32
sar
-0.26
sburg
-0.24
ä½Ĩ
-0.24
Ùĩ
-0.24
æŃ¤
-0.23
ska
-0.23
a
-0.23
sik
-0.23
POSITIVE LOGITS
Ve
0.18
gether
0.16
.sz
0.15
urent
0.14
/'
0.14
urai
0.14
ainen
0.14
readcr
0.14
odont
0.14
porno
0.13
Activations Density 0.051%