INDEX
    Explanations

    phrases related to progression and initial actions

    phrases indicating sequential actions or processes

    New Auto-Interp
    Negative Logits
    ancies
    -0.79
    olls
    -0.70
    bugs
    -0.66
    comments
    -0.65
    resistant
    -0.64
    ustomed
    -0.63
    rums
    -0.63
    sleep
    -0.61
    noxious
    -0.59
     sung
    -0.58
    POSITIVE LOGITS
     toward
    1.07
     towards
    1.06
     step
    0.93
     Steps
    0.84
     steps
    0.84
    nings
    0.77
     Towards
    0.75
    phase
    0.74
     hurdle
    0.73
     Step
    0.72
    Act Density 0.091%

    No Known Activations