INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     their
    -1.95
     its
    -1.49
    -1.38
     dua
    -1.34
     februari
    -1.30
    之色
    -1.27
     have
    -1.27
     on
    -1.25
     tumba
    -1.23
     Desember
    -1.22
    POSITIVE LOGITS
    Ingredienti
    1.58
    Torna
    1.47
    也被
    1.34
    lahraga
    1.34
    بسم
    1.33
    比起
    1.32
     erhi
    1.30
    不管是
    1.28
     élar
    1.27
     estábamos
    1.27
    Act Density 0.001%

    No Known Activations