INDEX
    Explanations

    phrases related to progress or advancement

    references to progress or advancements measured in steps

    New Auto-Interp
    Negative Logits
    ecause
    -0.82
    liction
    -0.77
    eer
    -0.72
    nces
    -0.70
    elta
    -0.67
    licts
    -0.67
    iqueness
    -0.65
    inately
    -0.63
    licted
    -0.63
    egu
    -0.63
    POSITIVE LOGITS
     forward
    1.29
     backward
    1.18
     backwards
    1.11
     forwards
    1.09
     toward
    1.07
     towards
    1.04
     closer
    0.99
     Forward
    0.99
    frog
    0.99
    forward
    0.96
    Act Density 0.023%

    No Known Activations