INDEX
    Explanations

    punctuation marks and discourse markers

    New Auto-Interp
    Negative Logits
    iais
    -0.17
    sing
    -0.15
    ép
    -0.15
    å°ı说
    -0.14
     nét
    -0.14
    иÑİ
    -0.14
    é
    -0.14
    ouro
    -0.14
    à¸ķà¸Ńà¸Ļ
    -0.14
    _COMPILE
    -0.14
    POSITIVE LOGITS
    âĶIJ
    0.19
    ÂĢÂĻ
    0.18
    ¦
    0.17
    âķĹ
    0.17
    ees
    0.17
     Shea
    0.17
    /'
    0.15
    nat
    0.15
    ullivan
    0.14
    ration
    0.14
    Act Density 0.033%

    No Known Activations