INDEX
Explanations
structural elements or markers in the text, particularly symbols and formatting characters
New Auto-Interp
Negative Logits
ambi
-0.16
aran
-0.16
uger
-0.15
aptic
-0.15
eam
-0.14
Ple
-0.14
.sharedInstance
-0.14
handshake
-0.14
elephant
-0.14
ple
-0.14
POSITIVE LOGITS
ãĥ¼ãĥĭ
0.18
ãĥ£
0.17
ùi
0.16
kees
0.15
YPE
0.15
pone
0.14
ç´
0.14
üz
0.14
оло
0.14
titled
0.14
Activations Density 0.039%