INDEX
    Explanations

    special tokens or symbols

    New Auto-Interp
    Negative Logits
     další
    1.58
     relación
    1.53
     destacó
    1.52
     coût
    1.48
     creatividad
    1.44
    Nick
    1.43
     Tidak
    1.42
     lainnya
    1.42
     ďal
    1.41
     sosok
    1.40
    POSITIVE LOGITS
     solidly
    0.82
     unnecessarily
    0.80
     appropriate
    0.78
     horribly
    0.77
     over
    0.76
     overs
    0.75
    pots
    0.75
     inappropriately
    0.74
     indiscrimin
    0.73
    tied
    0.73
    Act Density 0.007%

    No Known Activations