INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     интер
    -0.07
     رئیس
    -0.07
    uD
    -0.07
     Narrative
    -0.07
     OPTIONAL
    -0.07
     sana
    -0.07
    ricao
    -0.06
    (identity
    -0.06
    -0.06
    (Char
    -0.06
    POSITIVE LOGITS
     WILL
    0.06
     sticking
    0.06
    raham
    0.06
     wrinkles
    0.06
    .js
    0.05
     broadcasters
    0.05
    χεί
    0.05
     Pf
    0.05
    Pl
    0.05
    ئيس
    0.05
    Act Density 0.000%

    No Known Activations