INDEX
    Explanations

    instances of quotation marks or speech indicators

    New Auto-Interp
    Negative Logits
     Италијани
    -0.68
    Diweddarwch
    -0.66
    uxxxx
    -0.65
     مشين
    -0.65
    ValueStyle
    -0.60
     ujednoznacz
    -0.60
     autorytatywna
    -0.58
     disambiguazione
    -0.57
     ब्रेकडाउन
    -0.57
     estekak
    -0.56
    POSITIVE LOGITS
     tarko
    0.41
     nė
    0.39
     Otherwise
    0.37
    0.36
     sufficient
    0.35
     hence
    0.35
     otherwise
    0.35
     suficientes
    0.35
    обходи
    0.34
     deployed
    0.34
    Act Density 0.019%

    No Known Activations