INDEX
Explanations
questions, particularly those ending with a question mark and focusing on decision-making
rhetorical questions and expressions of uncertainty
New Auto-Interp
Negative Logits
oun
-0.56
hemer
-0.56
ogly
-0.53
bryce
-0.53
utor
-0.52
yp
-0.52
ouf
-0.52
DX
-0.51
outhern
-0.50
ogyn
-0.50
POSITIVE LOGITS
Eventually
0.68
Especially
0.63
Ultimately
0.62
Whether
0.61
regardless
0.61
lest
0.61
Otherwise
0.60
Instead
0.60
Unless
0.58
inaction
0.57
Activations Density 1.064%