INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     my
    -0.54
     maravilh
    -0.53
     travailleurs
    -0.53
     dieux
    -0.51
     navires
    -0.50
     vroeg
    -0.50
     miei
    -0.50
     legais
    -0.49
     мои
    -0.49
     âmes
    -0.49
    POSITIVE LOGITS
    <bos>
    1.08
     كومونز
    0.74
     مرئيه
    0.73
     more
    0.72
     autorytatywna
    0.72
    more
    0.71
    ArrowToggle
    0.71
    tanleria
    0.69
     الحره
    0.66
     فريبيس
    0.65
    Act Density 0.052%

    No Known Activations