INDEX
Explanations
phrases related to taking actions or steps
repeated mentions of the word "the" indicating a focus on articles
New Auto-Interp
Negative Logits
-+-+
-0.82
lished
-0.81
tions
-0.79
alde
-0.77
tion
-0.77
lich
-0.76
Operation
-0.76
ambo
-0.74
cade
-0.73
ntil
-0.73
POSITIVE LOGITS
brunt
1.35
plunge
1.34
opportunity
1.16
reins
1.15
initiative
1.12
helm
1.10
bait
1.07
blame
1.07
liberty
1.04
cue
0.95
Activations Density 0.049%