INDEX
    Explanations

    the presence of negation or arguments against common beliefs

    New Auto-Interp
    Negative Logits
    eres
    -0.15
    gnore
    -0.15
     revert
    -0.14
    ucker
    -0.14
    aces
    -0.14
     prov
    -0.14
    ÑģÑĤоÑĢ
    -0.14
    -assets
    -0.14
    OLL
    -0.14
    Foreground
    -0.14
    POSITIVE LOGITS
    arella
    0.16
    ulado
    0.15
    Slash
    0.15
    enuity
    0.15
    alth
    0.15
    iasi
    0.14
    adel
    0.14
     Mand
    0.14
    imity
    0.13
    chine
    0.13
    Act Density 0.091%

    No Known Activations