INDEX
    Explanations

    punctuation marks and formatting elements in the text

    New Auto-Interp
    Negative Logits
    allas
    -0.17
    kim
    -0.16
    obe
    -0.15
    hap
    -0.15
    StringLength
    -0.14
    assi
    -0.14
     Τι
    -0.13
    ayar
    -0.13
     TResult
    -0.13
    онÑĥ
    -0.13
    POSITIVE LOGITS
    ãĥĥãĥī
    0.20
    ška
    0.16
    央
    0.15
    usra
    0.15
     sensible
    0.15
     cá
    0.15
     seri
    0.14
    çĦ¼
    0.14
    abinet
    0.14
     Ulus
    0.14
    Act Density 0.002%

    No Known Activations