INDEX
    Explanations

    words and phrases conveying inclusion or agreement

    New Auto-Interp
    Negative Logits
    Ľi
    -0.17
    arus
    -0.15
    Äįi
    -0.15
    aso
    -0.14
    rious
    -0.14
    jeme
    -0.14
    ŀ
    -0.14
    ics
    -0.14
     âľ
    -0.13
    wald
    -0.13
    POSITIVE LOGITS
    -ÑĤаки
    0.15
    ål
    0.14
    ekk
    0.13
    ytt
    0.13
    dw
    0.13
    wend
    0.13
    yt
    0.13
    tw
    0.13
    exact
    0.13
    illez
    0.12
    Act Density 0.093%

    No Known Activations