INDEX
    Explanations

    punctuation marks, specifically different types of apostrophes

    New Auto-Interp
    Negative Logits
     للاسماء
    -0.84
     CreateTagHelper
    -0.69
    ViewFeatures
    -0.68
    Hentet
    -0.65
    RegressionTest
    -0.64
    AddTagHelper
    -0.59
     kasarigan
    -0.59
     صوتيه
    -0.58
    awtextra
    -0.58
    发表于
    -0.57
    POSITIVE LOGITS
    s
    0.71
    $'
    0.39
    |'
    0.35
    sweise
    0.35
    Twas
    0.32
    0.32
    ién
    0.32
    éndole
    0.30
    deki
    0.30
    sthe
    0.29
    Act Density 0.115%

    No Known Activations