INDEX
Explanations
questions or statements expressing doubt or disbelief
expressions of confusion or disbelief about specific situations
New Auto-Interp
Negative Logits
xon
-0.84
tsy
-0.69
pter
-0.68
vantage
-0.67
tnc
-0.65
aeper
-0.64
ieu
-0.61
catentry
-0.61
cu
-0.61
folios
-0.60
POSITIVE LOGITS
?!"
1.23
?!
1.18
!?"
1.17
!?
1.13
?",
0.98
?"
0.98
???
0.98
?)
0.96
?
0.96
?),
0.95
Activations Density 0.184%