INDEX
Explanations
exclamatory questions expressing surprise or disbelief
phrases that express confusion or surprise, often punctuated with exclamatory or interrogative marks
New Auto-Interp
Negative Logits
Ö¼
-0.78
amen
-0.73
hob
-0.71
itions
-0.70
odor
-0.68
corrid
-0.68
purified
-0.68
missions
-0.68
rive
-0.68
iral
-0.67
POSITIVE LOGITS
?!
1.52
?,
1.03
!?
1.02
?!"
1.01
!!
0.99
#$
0.98
!!!!!
0.98
Huh
0.98
??
0.97
Anyway
0.96
Activations Density 0.008%