INDEX
Explanations
phrases enclosed in double quotation marks
quotes and dialogue marks in the text
New Auto-Interp
Negative Logits
!'
-1.14
,'
-1.11
?'
-1.11
.'
-1.09
,'"
-0.94
?'"
-0.89
.'"
-0.89
)'
-0.88
!'"
-0.87
ãĢį
-0.84
POSITIVE LOGITS
"
2.42
"'
1.99
"[
1.95
"â̦
1.84
"...
1.79
"#
1.79
"(
1.78
"-
1.71
"$
1.68
".
1.63
Activations Density 0.140%