INDEX
Explanations
verbs expressing effort or willingness to take action
phrases emphasizing the ability or effort to take action
New Auto-Interp
Negative Logits
IB
-0.66
UR
-0.63
Politics
-0.61
Cait
-0.60
rejection
-0.60
Sands
-0.60
Passage
-0.59
leaked
-0.58
Irving
-0.58
Winner
-0.56
POSITIVE LOGITS
muster
1.02
berra
0.96
't
0.93
feas
0.89
afford
0.88
adian
0.83
emulate
0.82
nesota
0.80
strip
0.78
aido
0.77
Activations Density 0.081%