INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     finest
    -0.81
    ylvanian
    -0.81
     emo
    -0.79
    baix
    -0.79
     priced
    -0.77
     my
    -0.75
     fær
    -0.75
     wojny
    -0.73
     Espero
    -0.72
    -0.72
    POSITIVE LOGITS
    Ikke
    0.89
    ーー
    0.88
    SOURCE
    0.88
     Menurut
    0.86
     Setelah
    0.86
    Credit
    0.85
    ونا
    0.85
    ängerin
    0.84
    αι
    0.83
    zeichnungen
    0.82
    Act Density 0.026%

    No Known Activations