INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Reich
    -0.08
    -0.08
    -0.08
     cease
    -0.07
    -0.07
    REAL
    -0.07
    _png
    -0.07
     secretly
    -0.07
     Ferm
    -0.07
     Fiction
    -0.07
    POSITIVE LOGITS
     removes
    0.10
     Removes
    0.09
     stripping
    0.09
     eliminar
    0.09
     eliminates
    0.09
    pover
    0.08
     strips
    0.08
     పని
    0.08
     remover
    0.08
     remo
    0.08
    Act Density 0.001%

    No Known Activations