INDEX
Explanations
exclamatory phrases or emotional responses
New Auto-Interp
Negative Logits
"):
-1.07
`,
-1.06
.")
-1.04
)");
-1.02
".
-1.01
$")
-0.98
'),
-0.93
"},
-0.93
'):
-0.92
}")
-0.92
POSITIVE LOGITS
!
3.13
!!
2.53
!!!
2.47
!
2.31
!!!!
2.20
!
2.19
!)
2.19
!"
2.17
!”
2.08
!!!!!
2.04
Activations Density 0.867%