INDEX
    Explanations

    negations or forms of denial in the text

    New Auto-Interp
    Negative Logits
    orp
    -0.16
    ndern
    -0.16
    avor
    -0.15
    avou
    -0.15
    kiem
    -0.14
    criptor
    -0.14
    hiba
    -0.14
    ãĤŃãĥ³ãĤ°
    -0.14
    thon
    -0.14
     Tan
    -0.14
    POSITIVE LOGITS
     thì
    0.22
    çļĦè¯Ŀ
    0.22
     then
    0.18
    then
    0.16
     THEN
    0.16
    åĪĻ
    0.15
    ï¼ĮåĪĻ
    0.15
    fte
    0.15
     Hak
    0.15
    olas
    0.15
    Act Density 0.072%

    No Known Activations