INDEX
    Explanations

    much followed by comparative

    New Auto-Interp
    Negative Logits
     নারায়ণ
    1.74
    ipp
    1.71
     أيضاً
    1.70
     Utilize
    1.69
    ine
    1.68
    yk
    1.65
     ਸ਼
    1.63
    ATTRIBUTE
    1.63
    inated
    1.62
    ‍♀️
    1.59
    POSITIVE LOGITS
    <0x80>
    2.28
    2.00
    なり
    1.88
    కు
    1.87
    不错的
    1.84
    1.80
    不必
    1.78
    ación
    1.73
    으면
    1.71
     sayıda
    1.71
    Act Density 0.261%

    No Known Activations