INDEX
    Explanations

    phrases that indicate assistive actions and support

    New Auto-Interp
    Negative Logits
     while
    -0.07
     whereas
    -0.07
     gratuitement
    -0.07
    arena
    -0.06
    nat
    -0.06
    ellij
    -0.06
    ftp
    -0.06
     notamment
    -0.06
    while
    -0.06
    ersen
    -0.06
    POSITIVE LOGITS
     always
    0.12
    always
    0.11
     ALWAYS
    0.10
    Always
    0.10
     siempre
    0.10
     vždy
    0.10
     Always
    0.10
     вÑģегда
    0.09
     Äijá»ģu
    0.09
     sempre
    0.09
    Act Density 0.044%

    No Known Activations