INDEX
Explanations
punctuation marks, particularly commas and quotation marks
New Auto-Interp
Negative Logits
’e
-0.17
“[
-0.17
’ÑĹ
-0.15
‘
-0.15
“
-0.15
“Oh
-0.15
(“
-0.15
âĢŀM
-0.14
âĢŀV
-0.14
âĢŀN
-0.14
POSITIVE LOGITS
says
0.29
said
0.27
ÂĿ
0.25
reads
0.24
read
0.22
say
0.21
he
0.20
according
0.19
says
0.19
wrote
0.19
Activations Density 0.108%