INDEX
    Explanations

    unusual characters and symbols, potentially related to a specific language or writing system

    non-standard characters or symbols

    New Auto-Interp
    Negative Logits
    oleon
    -0.79
    theless
    -0.74
    wagen
    -0.66
    ierrez
    -0.66
    ktop
    -0.66
     charm
    -0.64
    APS
    -0.63
     concede
    -0.63
    enegger
    -0.62
    ãĥ¼ãĥĨãĤ£
    -0.62
    POSITIVE LOGITS
    ¹
    1.30
    ª
    1.16
    º
    1.14
    ©¶æ¥µ
    1.12
    ¨
    1.12
    ±
    1.12
    ¢
    1.10
    ²
    1.09
    Ń
    1.07
    ¥
    1.02
    Act Density 0.026%

    No Known Activations