INDEX
    Explanations

    expressions indicating negation or refutation

    New Auto-Interp
    Negative Logits
     вмеÑģÑĤ
    -0.14
    ombat
    -0.14
    mina
    -0.14
    ais
    -0.14
     instead
    -0.13
    ains
    -0.13
    istrat
    -0.13
    ivid
    -0.13
    UGE
    -0.13
     поба
    -0.13
    POSITIVE LOGITS
     necessarily
    0.70
     automatically
    0.49
    ecessarily
    0.41
     automatic
    0.38
     Automatically
    0.36
     automáticamente
    0.34
     обÑıзаÑĤелÑĮно
    0.31
     always
    0.31
    å¿ħ
    0.30
    ä¸Ģå®ļ
    0.30
    Act Density 0.105%

    No Known Activations