INDEX
    Explanations

    instances of the word "up."

    New Auto-Interp
    Negative Logits
    t
    -0.19
    rored
    -0.17
    unas
    -0.15
    ro
    -0.15
    isty
    -0.15
    ear
    -0.14
    isis
    -0.14
    ر
    -0.14
    ouri
    -0.14
    place
    -0.14
    POSITIVE LOGITS
    /down
    0.22
    datable
    0.22
    ping
    0.19
    stairs
    0.16
    turned
    0.16
    trecht
    0.16
    dater
    0.16
    shot
    0.16
    speed
    0.16
    sert
    0.15
    Act Density 0.094%

    No Known Activations