INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     velg
    -1.03
     Jumlah
    -0.93
     Modific
    -0.89
     Fungsi
    -0.89
    jelaskan
    -0.89
     Fakta
    -0.87
     Gå
    -0.87
     seleccionadas
    -0.86
     personalizada
    -0.86
    surprised
    -0.85
    POSITIVE LOGITS
     before
    0.95
     objectAtIndex
    0.82
     antes
    0.82
     ensure
    0.81
    onek
    0.81
    bise
    0.80
    nameof
    0.80
     przed
    0.78
    お勧め
    0.78
     genoemd
    0.77
    Act Density 0.004%

    No Known Activations