INDEX
    Explanations

    conjunctions and words indicating coordination or connection in sentences

    New Auto-Interp
    Negative Logits
    llib
    -0.16
     Trev
    -0.14
    å¼¥
    -0.14
    á»ĵng
    -0.14
    lash
    -0.13
    sembl
    -0.13
    unar
    -0.13
    ãĥ©ãĤ¹
    -0.13
    ãģľ
    -0.13
    Ring
    -0.13
    POSITIVE LOGITS
     without
    0.17
     Bened
    0.16
    æłª
    0.15
    elu
    0.15
    bjerg
    0.15
    -syntax
    0.14
    aggi
    0.14
    StdString
    0.14
    obo
    0.14
    dbg
    0.13
    Act Density 0.193%

    No Known Activations