INDEX
    Explanations

    phrases that indicate the beginning or initiation of actions

    New Auto-Interp
    Negative Logits
    udden
    -0.18
    ndx
    -0.16
    sian
    -0.14
    ertools
    -0.14
    .mx
    -0.14
    _through
    -0.14
    ाà¤Ĺत
    -0.14
    ilib
    -0.14
    oteric
    -0.13
    à¤Ŀ
    -0.13
    POSITIVE LOGITS
     off
    0.24
     somewhere
    0.22
     small
    0.22
     fresh
    0.21
     slow
    0.21
     right
    0.21
     wherever
    0.21
     simple
    0.20
     af
    0.20
     sentences
    0.20
    Act Density 0.069%

    No Known Activations