INDEX
    Explanations

    very high activation values for special characters or unusual characters in the text

    New Auto-Interp
    Negative Logits
    ogan
    -0.17
    або
    -0.16
    fty
    -0.15
    loat
    -0.14
    á»ķi
    -0.14
    aines
    -0.14
    arges
    -0.14
    rana
    -0.14
    ÑĥÑģÑĤа
    -0.14
    cede
    -0.14
    POSITIVE LOGITS
    ï¸
    0.20
    ¦
    0.17
    ï¸ı
    0.17
    Į
    0.16
    olle
    0.15
    ĥ
    0.15
    象
    0.14
    ibox
    0.14
    âĶģ
    0.14
    idor
    0.14
    Act Density 0.012%

    No Known Activations