INDEX
Explanations
expressions of gratitude and inquiries about understanding or clarifying various topics
New Auto-Interp
Negative Logits
!」
-1.03
?」
-1.02
?";
-0.81
?»
-0.76
?");
-0.75
!';
-0.75
!");
-0.72
?')
-0.71
!');
-0.71
?”
-0.70
POSITIVE LOGITS
!
1.94
?
1.71
!)
1.29
?)
1.10
!"
1.10
؟
1.09
!”
1.08
!?
1.04
!!
1.04
!'
1.03
Activations Density 0.305%