INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     was
    -0.07
    -0.07
     sollte
    -0.07
     differs
    -0.06
     rests
    -0.06
     is
    -0.06
    щення
    -0.06
    -0.06
    Esta
    -0.06
    ))+
    -0.06
    POSITIVE LOGITS
     بل
    0.07
    lıkla
    0.07
    ramework
    0.06
    ewise
    0.06
     Inside
    0.06
     Highlights
    0.06
     Persian
    0.06
    orton
    0.06
     included
    0.06
    ерж
    0.06
    Act Density 0.012%

    No Known Activations