INDEX
    Explanations

    phrases indicating progress or improvement

    phrases indicating progress or distance traveled toward a goal

    New Auto-Interp
    Negative Logits
    urated
    -0.80
    iries
    -0.70
    ividual
    -0.63
    uala
    -0.63
    ulhu
    -0.63
    icist
    -0.60
    rones
    -0.59
    zinski
    -0.58
     pairs
    -0.58
    iasco
    -0.58
    POSITIVE LOGITS
     toward
    0.91
     towards
    0.87
    WARD
    0.74
    fare
    0.69
     Towards
    0.68
    lier
    0.66
     Sabha
    0.64
     Drawn
    0.63
    finder
    0.63
     separating
    0.62
    Act Density 0.033%

    No Known Activations