INDEX
Explanations
expressions of humor or laughter
New Auto-Interp
Negative Logits
?");
-0.82
)";
-0.80
.",
-0.79
%");
-0.79
?")
-0.78
"),
-0.76
*/;
-0.76
.")
-0.76
:");
-0.75
.」
-0.72
POSITIVE LOGITS
<eos>
0.74
↵↵
0.64
But
0.62
And
0.61
This
0.60
!
0.59
Especially
0.57
They
0.57
(
0.57
I
0.57
Activations Density 0.145%