INDEX
Explanations
phrases in a specific format, likely related to statements or quotes in a structured discussion
instances of punctuation, particularly parentheses and commas
New Auto-Interp
Negative Logits
.
-0.66
--
-0.62
!
-0.61
,
-0.59
("-0.58
(
-0.54
.
-0.49
---
-0.48
--
-0.47
and
-0.47
POSITIVE LOGITS
),"
2.82
)",
2.80
)."
2.73
)"
2.71
)</
2.70
)[
2.51
)=
2.41
)]
2.40
)/
2.35
)'
2.33
Activations Density 0.013%