INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    andoned
    -0.08
     الش
    -0.08
     inutil
    -0.08
    442
    -0.08
    449
    -0.08
     punctuation
    -0.07
    788
    -0.07
     implanted
    -0.07
    384
    -0.07
    แน
    -0.07
    POSITIVE LOGITS
     scaling
    0.08
     Scaling
    0.08
    Scaling
    0.08
    caling
    0.08
     houses
    0.08
     Fro
    0.08
     nephew
    0.07
     Ursachen
    0.07
     halve
    0.07
    _scal
    0.07
    Act Density 0.001%

    No Known Activations