INDEX
    Explanations

    emergence and persistence of states

    New Auto-Interp
    Negative Logits
    Patch
    0.43
     celebratory
    0.41
     Sequences
    0.40
    roller
    0.39
     walking
    0.38
     Targeted
    0.38
    zh
    0.38
     pretend
    0.38
    例如
    0.37
     Walking
    0.37
    POSITIVE LOGITS
     arises
    0.75
     arose
    0.73
     arisen
    0.62
     arise
    0.58
     persists
    0.57
     lingers
    0.54
     возникает
    0.53
     outweighs
    0.52
     culminated
    0.52
     lessened
    0.51
    Act Density 0.045%

    No Known Activations