INDEX
Explanations
punctuation marks, specifically quotation marks
New Auto-Interp
Negative Logits
----
-0.83
-----
-0.76
----------------
-0.73
</strong>
-0.71
-0.70
--
-0.68
"
-0.67
---
-0.67
‐
-0.65
-0.65
POSITIVE LOGITS
)»
2.19
?»
2.09
,»
2.09
.»
1.99
!»
1.93
»,
1.88
»-
1.80
»)
1.78
)».
1.73
»?
1.72
Activations Density 0.103%