INDEX
Explanations
phrases indicating progress or movement towards goals
New Auto-Interp
Negative Logits
ongo
-0.16
ATURE
-0.16
ego
-0.15
olie
-0.15
upfront
-0.14
ature
-0.14
riad
-0.14
HOLDER
-0.14
imates
-0.14
INATION
-0.14
POSITIVE LOGITS
wards
0.30
/down
0.30
/back
0.30
ward
0.27
ly
0.26
/up
0.22
-thinking
0.21
into
0.21
-facing
0.19
most
0.19
Activations Density 0.052%