INDEX
Explanations
punctuation and structural elements in the text
New Auto-Interp
Negative Logits
aml
-0.17
stry
-0.15
osph
-0.15
igham
-0.15
odu
-0.14
ós
-0.14
rowned
-0.14
itten
-0.14
atrice
-0.14
atri
-0.14
POSITIVE LOGITS
dbe
0.16
_WAKE
0.15
ales
0.15
Wake
0.15
Tal
0.15
zia
0.14
wake
0.14
ÙĦب
0.14
pone
0.14
ÙĬع
0.14
Activations Density 0.100%