INDEX
    Explanations

    words related to procedures or instructions

    phrases indicating a sequence or set of actions

    New Auto-Interp
    Negative Logits
     Mostly
    -0.68
    gyn
    -0.66
     Corpus
    -0.66
    arium
    -0.63
    itute
    -0.63
    fest
    -0.62
     Confederacy
    -0.61
    zeb
    -0.59
     Unlimited
    -0.59
    orter
    -0.59
    POSITIVE LOGITS
     steps
    3.94
     Steps
    2.86
     step
    2.33
    steps
    2.22
     strides
    2.19
    step
    1.85
     Step
    1.74
    Step
    1.64
     stairs
    1.62
     footsteps
    1.61
    Act Density 0.013%

    No Known Activations