INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.08
     wed
    -0.08
    بيع
    -0.08
     Parteien
    -0.07
    Pieces
    -0.07
    €¢
    -0.07
     hello
    -0.07
    bib
    -0.07
    _piece
    -0.07
     hoppas
    -0.07
    POSITIVE LOGITS
     Jensen
    0.09
    entropy
    0.08
     المد
    0.08
     entropy
    0.08
     aka
    0.08
     sqrt
    0.07
     obedient
    0.07
     Yacht
    0.07
    (password
    0.07
    uedes
    0.07
    Act Density 0.004%

    No Known Activations