INDEX
Explanations
commas and quotation marks in text, indicating dialogue or quoted speech
New Auto-Interp
Negative Logits
iral
-0.14
ictor
-0.14
ienne
-0.13
lại
-0.13
rol
-0.13
uther
-0.13
(.)
-0.13
ans
-0.13
hl
-0.13
ural
-0.12
POSITIVE LOGITS
said
0.26
says
0.22
wrote
0.19
he
0.17
said
0.15
writes
0.15
sai
0.15
reads
0.14
Purple
0.14
paraph
0.14
Activations Density 0.091%