INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ugu
    -0.08
    -0.07
     Voj
    -0.07
     scl
    -0.07
     gv
    -0.06
    -0.06
     capsule
    -0.06
    -0.06
    -0.06
    _conn
    -0.06
    POSITIVE LOGITS
    Appear
    0.07
     Air
    0.07
     walkthrough
    0.07
    lb
    0.06
    CLICK
    0.06
    ади
    0.06
    irst
    0.06
    ptune
    0.06
    .Fire
    0.06
    lr
    0.06
    Act Density 0.012%

    No Known Activations