INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    en
    1.50
    EN
    1.32
    1.19
    1.16
    1.10
     acabar
    1.09
    1.09
    shows
    1.09
    executor
    1.05
    ε
    1.04
    POSITIVE LOGITS
    tones
    1.45
    leftrightarrow
    1.44
    whel
    1.36
    joyed
    1.32
    lapping
    1.29
    emphas
    1.28
    ként
    1.27
    whelming
    1.24
    valued
    1.24
     apă
    1.15
    Act Density 0.055%

    No Known Activations