INDEX
Explanations
rhetorical questions and humor in conversational contexts
New Auto-Interp
Negative Logits
emo
-0.16
antry
-0.16
aders
-0.15
Surprise
-0.15
issen
-0.14
transformed
-0.14
/modal
-0.14
ichier
-0.14
ivo
-0.14
ont
-0.14
POSITIVE LOGITS
scatter
0.16
UGHT
0.16
ENCH
0.15
Scatter
0.15
Ere
0.14
港
0.14
itra
0.14
oleÄį
0.14
Flor
0.14
lags
0.14
Activations Density 0.028%