INDEX
    Explanations

    punctuation marks, particularly periods

    New Auto-Interp
    Negative Logits
    å¹¹
    -0.16
    ixo
    -0.15
     Sher
    -0.15
    itung
    -0.14
    orro
    -0.14
    ãĤĦãģĻ
    -0.14
    ost
    -0.14
    885
    -0.14
    stry
    -0.14
    istra
    -0.14
    POSITIVE LOGITS
    ίÏīν
    0.15
    phis
    0.15
    errat
    0.15
    رÙĪØ²
    0.15
    osci
    0.15
    álo
    0.15
     biên
    0.14
    森
    0.14
    woff
    0.14
    nton
    0.14
    Act Density 0.004%

    No Known Activations