INDEX
    Explanations

    specific formatting elements and punctuation in the text

    New Auto-Interp
    Negative Logits
    åİ
    -0.19
     تÙĨ
    -0.17
    urette
    -0.16
    İ
    -0.16
     Erk
    -0.16
    ãĥĪ
    -0.15
    ushima
    -0.15
    edin
    -0.15
     ãĥĪ
    -0.15
    .Std
    -0.14
    POSITIVE LOGITS
    ivate
    0.18
    ás
    0.16
     اÙĦÙĬÙħÙĨ
    0.16
    ias
    0.15
     Pam
    0.15
    ante
    0.15
    heimer
    0.15
     AS
    0.15
    èĩ¨
    0.14
    -as
    0.14
    Act Density 0.065%

    No Known Activations