INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     العل
    -0.07
    -0.06
     Ionic
    -0.06
     нату
    -0.06
    iores
    -0.06
     Chic
    -0.06
    ilee
    -0.06
     Böl
    -0.06
     Kre
    -0.06
     öğret
    -0.06
    POSITIVE LOGITS
    man
    0.29
    MAN
    0.25
     Man
    0.20
     man
    0.19
    Man
    0.18
    -man
    0.17
     MAN
    0.17
    mann
    0.16
    -Man
    0.15
    _Man
    0.14
    Act Density 0.061%

    No Known Activations