INDEX
    Explanations

    token identification and usage

    New Auto-Interp
    Negative Logits
    ure
    1.19
    ts
    1.18
    t
    1.10
    y
    1.03
    taining
    1.02
    ,
    1.02
    gt
    1.00
    ty
    0.98
    ti
    0.94
    in
    0.92
    POSITIVE LOGITS
     Token
    1.20
    Token
    1.16
    令牌
    1.11
     getToken
    1.08
    م
    1.06
    ד
    1.01
     Tokens
    1.00
     TOKEN
    0.99
     esperado
    0.95
    من
    0.93
    Act Density 0.024%

    No Known Activations