INDEX
Explanations
quotation marks indicating dialogue or quotes
New Auto-Interp
Negative Logits
“
-0.39
âĢŀ
-0.38
(“
-0.30
“[
-0.29
ãĢĮãģĤ
-0.26
ãĢĮ
-0.25
ãĢĮ
-0.23
ãĢĮãģĬ
-0.23
``
-0.23
“We
-0.22
POSITIVE LOGITS
[]"
0.26
()"
0.24
()",
0.21
¦
0.20
."↵↵
0.20
()"↵
0.19
!",
0.18
!"
0.18
();"
0.18
?",
0.18
Activations Density 0.488%