INDEX
    Explanations

    numeric or symbolic characters that appear at the end of words

    sequences or symbols that signify a particular emphasis or pattern, likely related to coded or specialized language

    New Auto-Interp
    Negative Logits
     Mub
    -0.74
    bda
    -0.72
     rake
    -0.71
    Downloadha
    -0.68
     disenfranch
    -0.65
    ukong
    -0.63
    bucks
    -0.62
     Peel
    -0.61
     warr
    -0.60
     levers
    -0.60
    POSITIVE LOGITS
    ħ
    1.11
    Ĭ
    0.99
    ¡
    0.99
    Į
    0.98
    İ
    0.96
    Û
    0.92
    ŀ
    0.88
    Ĩ
    0.87
    Ĥª
    0.85
    ¾
    0.85
    Act Density 0.020%

    No Known Activations