INDEX
Explanations
patterns of denial and justification in discussions
Following certain words or phrases
negation or denial
language that denies or minimizes responsibility or wrongdoing—refuting connections, claiming coincidence, or expressing skeptical/ironic dismissal.
New Auto-Interp
Negative Logits
DockStyle
-0.82
+#+#
-0.66
ftagPool
-0.65
ConstraintMaker
-0.60
OGND
-0.58
estekak
-0.58
BeginContext
-0.57
kaarangay
-0.57
tagext
-0.57
pinulongan
-0.56
POSITIVE LOGITS
eek
0.39
yargs
0.39
Ingram
0.37
البر
0.37
denied
0.37
def
0.36
denied
0.36
interest
0.35
def
0.35
liege
0.34
Activations Density 0.318%