INDEX
Explanations
questions or statements expressing uncertainty or speculation
New Auto-Interp
Negative Logits
ciating
-0.68
ertodd
-0.66
herent
-0.58
phrine
-0.58
Vital
-0.54
umsy
-0.54
cakes
-0.54
etsk
-0.54
ACTION
-0.52
PRESS
-0.52
POSITIVE LOGITS
?),
0.66
?!
0.58
?).
0.58
?:
0.57
darn
0.56
?)
0.56
suppose
0.56
wonder
0.55
chalk
0.53
prest
0.53
Activations Density 7.734%