INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ಾಯಿತು
    -0.08
     الصغيرة
    -0.08
    ứng
    -0.07
     infinitely
    -0.07
    שור
    -0.07
     advies
    -0.07
     apologized
    -0.07
     prayed
    -0.07
    ט
    -0.07
     ↵        ↵
    -0.07
    POSITIVE LOGITS
     तुलना
    0.11
     comparison
    0.11
     comparaison
    0.11
     comparação
    0.11
     তুল
    0.10
     comparative
    0.10
     비교
    0.10
     Comparative
    0.10
     Comparison
    0.10
    Comparison
    0.10
    Act Density 0.061%

    No Known Activations