INDEX
    Explanations

    specific non-English or foreign language terms

    New Auto-Interp
    Negative Logits
    ma
    -0.23
    pa
    -0.20
    me
    -0.19
    med
    -0.19
    li
    -0.18
    s
    -0.18
    ese
    -0.18
    ses
    -0.18
    ii
    -0.18
    pu
    -0.18
    POSITIVE LOGITS
    akov
    0.19
    Leaks
    0.19
    yum
    0.19
    yas
    0.18
    yaw
    0.18
    ê¹
    0.18
    yat
    0.17
    yar
    0.17
    elho
    0.16
    yo
    0.16
    Act Density 0.682%

    No Known Activations