INDEX
    Explanations

    punctuation marks, specifically commas

    New Auto-Interp
    Negative Logits
    leſs
    -1.01
    eſt
    -0.85
     itſelf
    -0.83
    neſs
    -0.82
     Eſ
    -0.79
     faſt
    -0.79
     Houſe
    -0.78
     myſelf
    -0.78
    ſel
    -0.77
     Spon
    -0.77
    POSITIVE LOGITS
    2.86
     ,
    1.83
    ),
    1.79
     ،
    1.68
    ،
    1.66
    ),
    1.57
    ,\
    1.49
    ”,
    1.49
    %,
    1.43
    ,
    
    1.40
    Act Density 0.072%

    No Known Activations