INDEX
    Explanations

    words and phrases indicating obligations and interpersonal support

    After negations or expressions of doubt

    New Auto-Interp
    Negative Logits
    ]='\
    -0.72
    OGND
    -0.67
    DoNot
    -0.58
    ggak
    -0.57
    -0.55
     "{\"
    -0.54
     newData
    -0.53
    thwaite
    -0.53
     Efq
    -0.53
     Brahmin
    -0.52
    POSITIVE LOGITS
     nor
    0.71
     sondern
    0.70
     relever
    0.68
    mbggenerated
    0.66
     للمعارف
    0.65
    nor
    0.60
    而是
    0.59
    又是一
    0.54
     nedenle
    0.52
     estekak
    0.51
    Act Density 0.220%

    No Known Activations