INDEX
    Explanations

    words related to taking action or responsibility

    instances of the word "step" and its variations

    New Auto-Interp
    Negative Logits
    selage
    -0.78
    ecause
    -0.69
    rontal
    -0.66
    orsche
    -0.63
    pport
    -0.62
    ores
    -0.60
    raid
    -0.60
    è¦ļéĨĴ
    -0.60
     Reasons
    -0.59
    herent
    -0.58
    POSITIVE LOGITS
     forth
    0.99
    frog
    0.97
     aside
    0.97
     forward
    0.91
     ashore
    0.87
     up
    0.84
     foot
    0.84
    up
    0.80
     out
    0.79
     toe
    0.77
    Act Density 0.029%

    No Known Activations