INDEX
    Explanations

    with, without, avec, with

    New Auto-Interp
    Negative Logits
     deoarece
    0.38
    hmC
    0.35
     aquilo
    0.34
     joten
    0.34
    hatan
    0.34
    itates
    0.33
     lop
    0.33
     pois
    0.33
    hluk
    0.33
     deshalb
    0.32
    POSITIVE LOGITS
     причем
    0.74
     עם
    0.68
     avec
    0.63
     WITH
    0.61
     with
    0.60
    ただし
    0.57
     включая
    0.57
     without
    0.57
    하거나
    0.57
     χωρίς
    0.55
    Act Density 1.786%

    No Known Activations