INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    م
    1.01
    ல்
    0.97
     대해서
    0.97
    ترك
    0.87
     இருந்து
    0.86
    0.85
    on
    0.83
    ка
    0.83
    τών
    0.82
    त्रि
    0.82
    POSITIVE LOGITS
    givings
    1.50
     gracias
    1.47
     thanks
    1.42
     grazie
    1.40
     graças
    1.33
    giving
    1.29
    Thanks
    1.28
     Thanks
    1.27
    thanks
    1.22
     díky
    1.18
    Act Density 0.012%

    No Known Activations