INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Afr
    -0.08
    нач
    -0.08
    ыль
    -0.08
     skinny
    -0.08
    rej
    -0.08
    Summary
    -0.08
     anz
    -0.07
     inequality
    -0.07
    ney
    -0.07
    jack
    -0.07
    POSITIVE LOGITS
     જેમાં
    0.10
     जिससे
    0.09
     જેના
    0.09
     Vit
    0.09
     जिन्हें
    0.08
     wenn
    0.08
     která
    0.08
    0.08
     aber
    0.08
     yenye
    0.08
    Act Density 0.073%

    No Known Activations