INDEX
    Explanations

    instances of denial or contradiction in statements

    New Auto-Interp
    Negative Logits
    oun
    -0.17
    /wiki
    -0.15
    etsk
    -0.15
     contrary
    -0.14
    emens
    -0.14
    contr
    -0.14
    aci
    -0.14
    hrad
    -0.14
    rá
    -0.13
     Contr
    -0.13
    POSITIVE LOGITS
     nevertheless
    0.20
     nonetheless
    0.17
    æŃ¡
    0.15
    amon
    0.14
     stuck
    0.14
    istra
    0.14
     Essentially
    0.14
     ÑĦакÑĤ
    0.14
    Nevertheless
    0.13
    AMS
    0.13
    Act Density 0.163%

    No Known Activations