INDEX
    Explanations

    words related to language proficiency and multilingualism

    New Auto-Interp
    Negative Logits
    zilla
    -0.18
    inka
    -0.18
    neh
    -0.17
    oa
    -0.16
    oval
    -0.15
    orca
    -0.15
    znam
    -0.14
    ooke
    -0.14
    èļ
    -0.14
    pto
    -0.14
    POSITIVE LOGITS
    à¥įतà¤ķ
    0.17
    653
    0.15
    ahr
    0.15
     Barbar
    0.15
    ãĥ³ãĥķ
    0.15
    898
    0.15
    596
    0.14
    98
    0.14
    620
    0.14
    998
    0.14
    Act Density 0.066%

    No Known Activations