INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     remnants
    -0.08
     nt
    -0.07
     Dead
    -0.07
    Part
    -0.07
     ratt
    -0.07
    ART
    -0.07
     tracks
    -0.07
     TRACK
    -0.07
    -0.06
     net
    -0.06
    POSITIVE LOGITS
     chosen
    0.14
     choose
    0.14
    chosen
    0.12
     Choice
    0.11
     Choose
    0.11
     choosing
    0.11
     choice
    0.11
    .choose
    0.10
     Choosing
    0.10
    choice
    0.10
    Act Density 0.039%

    No Known Activations