INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     of
    -1.11
    sibilidad
    -0.84
     dijual
    -0.84
    fisk
    -0.81
    ниях
    -0.80
     societal
    -0.79
     dipakai
    -0.78
     число
    -0.75
     אביב
    -0.74
    склю
    -0.74
    POSITIVE LOGITS
     money
    0.90
     Añade
    0.88
     romano
    0.85
    Ridge
    0.81
     طويلة
    0.80
    нес
    0.78
     hours
    0.77
     involucrados
    0.77
     moneys
    0.77
     gummies
    0.75
    Act Density 0.002%

    No Known Activations