INDEX
    Explanations

    phrases related to direction and progress towards a goal

    New Auto-Interp
    Negative Logits
     Calvo
    -0.83
     foglal
    -0.71
     pep
    -0.70
    5
    -0.70
    -0.68
     печа
    -0.66
    t
    -0.65
     fatica
    -0.65
    T
    -0.64
    '
    -0.64
    POSITIVE LOGITS
    toward
    1.94
     toward
    1.89
     Toward
    1.88
    towards
    1.84
     Towards
    1.82
    Towards
    1.77
     towards
    1.74
    Toward
    1.73
     hacia
    1.33
     envers
    1.25
    Act Density 0.052%

    No Known Activations