INDEX
Explanations
phrases emphasizing responsibility or significant actions taken in various contexts
New Auto-Interp
Negative Logits
ddy
-0.16
odst
-0.16
take
-0.16
takeaway
-0.15
±
-0.15
fur
-0.15
yen
-0.15
TAKE
-0.14
Take
-0.14
cka
-0.14
POSITIVE LOGITS
initiative
0.33
plunge
0.32
opportunity
0.29
reins
0.27
bull
0.26
liberty
0.26
Initiative
0.25
lead
0.25
cue
0.25
helm
0.25
Activations Density 0.029%