INDEX
    Explanations

    phrases related to physical exertion or effort

    New Auto-Interp
    Negative Logits
    psons
    -0.70
    lys
    -0.67
    atural
    -0.63
    obyl
    -0.63
     Recogn
    -0.62
     Faces
    -0.62
    Interstitial
    -0.61
     redes
    -0.59
    brance
    -0.59
    nam
    -0.59
    POSITIVE LOGITS
     forward
    1.08
    chairs
    1.00
     toward
    0.98
     boundaries
    0.93
    back
    0.93
     towards
    0.93
     harder
    0.92
     aside
    0.92
     onward
    0.90
     ahead
    0.86
    Act Density 0.507%

    No Known Activations