INDEX
    Explanations

    numbers and punctuation

    New Auto-Interp
    Negative Logits
     ثم
    -0.75
     can
    -0.73
     verme
    -0.69
    rdida
    -0.67
    小的
    -0.66
     Bremer
    -0.66
    roskop
    -0.66
     JEE
    -0.66
     Kush
    -0.65
    dostęp
    -0.65
    POSITIVE LOGITS
    0.87
    ímos
    0.80
    0.79
     namanya
    0.79
     bình
    0.78
     renang
    0.78
     オイル
    0.76
    Ikke
    0.73
     fantástico
    0.72
     верно
    0.72
    Act Density 0.040%

    No Known Activations