INDEX
Explanations
expressions of self-reflection or rhetorical questions directed at the reader
New Auto-Interp
Negative Logits
tingham
-0.15
заÑħ
-0.15
jerne
-0.15
.sep
-0.14
én
-0.14
atings
-0.14
trys
-0.14
allas
-0.14
tal
-0.13
Sep
-0.13
POSITIVE LOGITS
MT
0.16
odzi
0.15
defs
0.15
dex
0.15
aura
0.14
odic
0.14
eker
0.14
odal
0.13
slick
0.13
IDEO
0.13
Activations Density 0.123%