INDEX
Explanations
phrases indicating uncertainty or indecision
references to indecision or uncertainty about actions
New Auto-Interp
Negative Logits
arity
-0.81
////////////////
-0.60
Chap
-0.60
hyde
-0.60
quad
-0.59
////////////////////////////////
-0.58
mony
-0.58
Barrett
-0.57
thwarted
-0.57
ĵ
-0.55
POSITIVE LOGITS
expect
1.38
Expect
0.98
do
0.92
look
0.89
eat
0.88
wear
0.86
prioritize
0.86
anticipate
0.83
ask
0.81
believe
0.81
Activations Density 0.037%