INDEX
Explanations
words and phrases indicating obligations and interpersonal support
After negations or expressions of doubt
cannot help but
New Auto-Interp
Negative Logits
]='\
-0.72
OGND
-0.67
DoNot
-0.58
ggak
-0.57
ⓧ
-0.55
"{\"-0.54
newData
-0.53
thwaite
-0.53
Efq
-0.53
Brahmin
-0.52
POSITIVE LOGITS
nor
0.71
sondern
0.70
relever
0.68
mbggenerated
0.66
للمعارف
0.65
nor
0.60
而是
0.59
又是一
0.54
nedenle
0.52
estekak
0.51
Activations Density 0.220%