INDEX
    Explanations

    expressions related to progress and momentum

    New Auto-Interp
    Negative Logits
    roat
    -0.15
     Norm
    -0.14
    eren
    -0.14
    indsight
    -0.14
    ť
    -0.14
    é¨ĵ
    -0.14
    sons
    -0.14
    utherford
    -0.14
    reen
    -0.13
    sth
    -0.13
    POSITIVE LOGITS
     momentum
    0.20
    iglia
    0.18
    heading
    0.16
    rrha
    0.15
    oku
    0.15
    ertino
    0.15
    иÑİ
    0.15
     Momentum
    0.15
    ê±°ëŀĺ
    0.14
     toward
    0.14
    Act Density 0.056%

    No Known Activations