INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2.14
    ا
    1.92
    ες
    1.80
    িম
    1.77
    एस
    1.75
    да
    1.72
     sót
    1.72
    ı
    1.70
     זה
    1.70
    у
    1.63
    POSITIVE LOGITS
    a
    2.23
    iendo
    1.95
    vents
    1.92
    rt
    1.89
    ی
    1.88
    ielles
    1.84
    cage
    1.83
    ாலிக
    1.82
    ciones
    1.80
    cendo
    1.80
    Act Density 0.013%

    No Known Activations