INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     chid
    -0.09
    estr
    -0.08
     Pyr
    -0.08
    -0.08
    avra
    -0.08
     vira
    -0.07
     reviewed
    -0.07
    _flip
    -0.07
    ühlen
    -0.07
    -0.07
    POSITIVE LOGITS
     Kash
    0.10
     bab
    0.08
     excessive
    0.07
     downright
    0.07
    ني
    0.07
     obey
    0.07
    0.07
    ность
    0.07
     Oktober
    0.07
     Spin
    0.07
    Act Density 0.057%

    No Known Activations