INDEX
    Explanations

    phrases indicating progress or movement towards goals

    New Auto-Interp
    Negative Logits
    ongo
    -0.16
    ATURE
    -0.16
    ego
    -0.15
    olie
    -0.15
     upfront
    -0.14
    ature
    -0.14
    riad
    -0.14
    HOLDER
    -0.14
    imates
    -0.14
    INATION
    -0.14
    POSITIVE LOGITS
    wards
    0.30
    /down
    0.30
    /back
    0.30
    ward
    0.27
    ly
    0.26
    /up
    0.22
    -thinking
    0.21
     into
    0.21
    -facing
    0.19
    most
    0.19
    Act Density 0.052%

    No Known Activations