INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     widespread
    -0.08
     reduz
    -0.08
    wide
    -0.08
     biker
    -0.08
     tenu
    -0.08
     ment
    -0.07
     cock
    -0.07
     ram
    -0.07
     recruiting
    -0.07
     roc
    -0.07
    POSITIVE LOGITS
    一下
    0.09
     masa
    0.08
     Masa
    0.08
     shed
    0.08
     sheds
    0.08
    0.08
    _hd
    0.07
     kế
    0.07
     BEL
    0.07
    ите
    0.07
    Act Density 0.005%

    No Known Activations