INDEX
    Explanations

    describing function or effect

    New Auto-Interp
    Negative Logits
     biru
    0.32
     masalah
    0.30
     rapaz
    0.30
     conundrum
    0.29
     intrig
    0.29
     laranja
    0.29
     problemas
    0.29
     jornalista
    0.29
     judul
    0.29
     encontr
    0.28
    POSITIVE LOGITS
    某些
    0.27
     функциона
    0.26
    ከናወ
    0.26
    owered
    0.26
    过程中
    0.25
    0.25
     ఉత్ప
    0.25
     এতটাই
    0.24
    াস
    0.24
    ifferentiated
    0.24
    Act Density 0.574%

    No Known Activations