INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     وتسجيلات
    -0.70
     Egli
    -0.66
    omości
    -0.65
    іод
    -0.65
     Castor
    -0.64
    ண்டும்
    -0.64
     Majefty
    -0.63
     ocurrido
    -0.62
    seur
    -0.60
    erweise
    -0.59
    POSITIVE LOGITS
     thank
    1.03
     Thank
    1.01
    Thank
    0.98
    thank
    0.94
     thanks
    0.93
     THANK
    0.91
    kyou
    0.88
    Thanks
    0.86
     gracias
    0.85
     Thanks
    0.84
    Act Density 0.035%

    No Known Activations