INDEX
    Explanations

    punctuation marks, especially exclamation points and periods

    New Auto-Interp
    Negative Logits
    ãģ¯ãģļ
    -0.17
    gnore
    -0.15
    æŃ£
    -0.13
    âĢĥ
    -0.13
    ramid
    -0.13
     آخرÛĮÙĨ
    -0.13
     Dou
    -0.13
    ãģĹãģı
    -0.13
    412
    -0.13
     (*(
    -0.13
    POSITIVE LOGITS
    1
    0.40
    01
    0.28
    Û±
    0.24
    ï¼ij
    0.24
    âijł
    0.20
    âĦĸ
    0.18
    âĤģ
    0.18
    âĸį
    0.17
    anik
    0.16
     firstly
    0.16
    Act Density 0.081%

    No Known Activations