INDEX
    Explanations

    negations or phrases indicating refusal or denial

    New Auto-Interp
    Negative Logits
     Савезне
    -0.97
    исленность
    -0.85
    WebElementEntity
    -0.82
     cherchés
    -0.81
     snippetHide
    -0.81
    曖昧さ回避
    -0.77
     autorytatywna
    -0.77
    NUMX
    -0.75
    ]--;
    -0.75
    :✨
    -0.75
    POSITIVE LOGITS
    '
    1.68
    1.66
    ´
    1.02
    `
    1.00
    0.85
    â
    0.78
    ʻ
    0.77
    ʼ
    0.76
    &#
    0.76
    ''
    0.74
    Act Density 0.142%

    No Known Activations