INDEX
    Explanations

    words related to negation and differential expressions

    New Auto-Interp
    Negative Logits
     pysty
    -0.57
     persones
    -0.55
     føl
    -0.55
     umana
    -0.54
     nemico
    -0.53
    ńskich
    -0.53
     purposes
    -0.52
     įsi
    -0.52
     menschliche
    -0.51
     desselben
    -0.51
    POSITIVE LOGITS
     schon
    0.98
     noch
    0.82
    windowFixed
    0.71
     только
    0.70
     gerade
    0.69
    )))));
    0.68
     nog
    0.67
    Только
    0.66
     רק
    0.65
     тільки
    0.63
    Act Density 0.372%

    No Known Activations