INDEX
Explanations
pronouns and personal references within the text
New Auto-Interp
Negative Logits
LOPT
-0.17
orado
-0.16
drž
-0.16
oÅĻ
-0.15
chân
-0.15
379
-0.14
.ws
-0.14
Ñıн
-0.14
oggler
-0.14
ILER
-0.14
POSITIVE LOGITS
ijn
0.17
cop
0.15
spath
0.14
vig
0.14
mate
0.14
aced
0.14
levard
0.13
sj
0.13
zig
0.13
emerg
0.13
Activations Density 0.164%