INDEX
Explanations
questions or uncertainties about what action to take
phrases expressing uncertainty or confusion about actions
New Auto-Interp
Negative Logits
quad
-0.65
aires
-0.64
panel
-0.64
members
-0.62
ĵ
-0.61
ģĸ
-0.61
proving
-0.60
vanquished
-0.59
validated
-0.59
¹
-0.59
POSITIVE LOGITS
expect
1.28
ilers
0.89
classify
0.89
igl
0.88
prioritize
0.87
say
0.86
believe
0.84
eat
0.84
Expect
0.84
buy
0.83
Activations Density 0.042%