INDEX
    Explanations

    punctuation marks and their surrounding contexts

    New Auto-Interp
    Negative Logits
    ibold
    -0.16
    anton
    -0.15
    ãĥģãĥ¥
    -0.15
    лаз
    -0.14
    964
    -0.14
    ãĥªãĤ«
    -0.14
     tune
    -0.14
    inea
    -0.14
    umber
    -0.14
    fone
    -0.14
    POSITIVE LOGITS
     æŃ¦
    0.15
    uil
    0.15
    æĮĻ
    0.14
    wear
    0.14
    mul
    0.14
    nda
    0.14
    _skb
    0.14
     DG
    0.14
    primitive
    0.14
    DG
    0.13
    Act Density 0.002%

    No Known Activations