INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ׇ
    -1.45
     my
    -1.28
     cukru
    -1.21
     dikenal
    -1.20
     dictionnaire
    -1.16
    figurine
    -1.10
     escritas
    -1.10
     menjadi
    -1.09
     сахара
    -1.09
     for
    -1.02
    POSITIVE LOGITS
    bemos
    1.39
    っており
    1.34
    lograph
    1.30
    ִּ
    1.23
    ्ह
    1.23
    BeforeAll
    1.20
    ְּ
    1.19
     yesterday
    1.18
    llac
    1.18
     우리
    1.16
    Act Density 0.013%

    No Known Activations