INDEX
    Explanations

    instances of the word "forward" and related terms indicating progress or advancement

    New Auto-Interp
    Negative Logits
    hetto
    -0.16
     reluct
    -0.15
    eral
    -0.15
    utes
    -0.14
    kou
    -0.14
    imas
    -0.14
     shar
    -0.14
    plete
    -0.14
     Minute
    -0.14
    ks
    -0.14
    POSITIVE LOGITS
    /back
    0.27
    wards
    0.18
    forward
    0.18
    QUIRES
    0.18
    -forward
    0.18
    -thinking
    0.17
    /down
    0.17
    warf
    0.17
    ward
    0.17
     forward
    0.16
    Act Density 0.037%

    No Known Activations