INDEX
Explanations
phrases that are enclosed in quotation marks
phrases that include quotations
New Auto-Interp
Negative Logits
—
-0.55
arnaev
-0.52
bably
-0.49
--
-0.47
laborers
-0.45
mistaken
-0.45
(@
-0.45
firsthand
-0.45
rompt
-0.45
afterward
-0.44
POSITIVE LOGITS
",
3.24
!",
2.66
?",
2.62
)",
2.62
.",
2.56
".
2.54
".[
2.51
",
2.48
"),
2.45
"],
2.34
Activations Density 0.018%