INDEX
Explanations
rhetorical questions emphasizing uncertainty or concern
New Auto-Interp
Negative Logits
quartered
-0.84
inous
-0.70
esta
-0.68
una
-0.68
ificant
-0.67
iquette
-0.66
gest
-0.65
à¨
-0.65
cipled
-0.65
stem
-0.64
POSITIVE LOGITS
Wouldn
1.47
Suppose
1.28
Surely
1.26
Would
1.23
Could
1.09
Imagine
1.04
Turns
1.02
Maybe
1.01
Well
1.01
Isn
0.98
Activations Density 0.057%