INDEX
    Explanations

    punctuation marks, particularly at the end of phrases

    New Auto-Interp
    Negative Logits
     /
    -0.50
     Er
    -0.50
     I
    -0.50
    LayoutStyle
    -0.48
     scheme
    -0.48
     trans
    -0.48
    伝わ
    -0.47
     thứ
    -0.47
     or
    -0.47
     للا
    -0.46
    POSITIVE LOGITS
    ")
    1.82
    ?")
    1.73
    !")
    1.64
    .")
    1.63
    '")
    1.62
    ?')
    1.61
    ')
    1.60
    ."]
    1.58
    %")
    1.58
    !')
    1.53
    Act Density 0.135%

    No Known Activations