INDEX
Explanations
phrases indicating hesitation, curiosity, or introspection
phrases expressing uncertainty or difficulty in actions
New Auto-Interp
Negative Logits
illary
-0.83
Offline
-0.76
ver
-0.68
Delivery
-0.68
ftime
-0.67
ories
-0.65
ilater
-0.62
liest
-0.62
ificial
-0.61
Dru
-0.61
POSITIVE LOGITS
grin
1.06
laugh
1.01
chuckle
0.99
wonder
0.99
feel
0.98
notice
0.96
smile
0.94
feeling
0.88
impressed
0.87
noticing
0.86
Activations Density 0.070%