INDEX
    Explanations

    special characters and symbols in the text

    New Auto-Interp
    Negative Logits
    erox
    -0.16
    ÃħŸ
    -0.15
    ách
    -0.14
    úa
    -0.14
    âĸº
    -0.14
    ži
    -0.14
    eyh
    -0.14
    uÃŃ
    -0.14
     ¶
    -0.13
    arrow
    -0.13
    POSITIVE LOGITS
    É
    0.36
    Ê
    0.31
    Ë
    0.31
    ÉĻ
    0.29
    Ì
    0.21
     -/↵
    0.19
    ɵ
    0.18
    Ãĭ
    0.17
    á
    0.16
    ænd
    0.16
    Act Density 0.005%

    No Known Activations