INDEX
    Explanations

    phrases related to progress or advancement

    phrases related to progress or actions taken towards improvement

    New Auto-Interp
    Negative Logits
    ecause
    -0.80
    inately
    -0.78
    ores
    -0.77
    iqueness
    -0.75
    licts
    -0.75
    rums
    -0.75
    eatures
    -0.74
    nces
    -0.72
    licted
    -0.71
    liction
    -0.70
    POSITIVE LOGITS
     forward
    1.22
     toward
    1.09
     towards
    1.07
     backward
    1.00
    daughter
    0.98
     forwards
    0.97
    steps
    0.93
     backwards
    0.92
    step
    0.91
     Forward
    0.84
    Act Density 0.036%

    No Known Activations