INDEX
Explanations
negations and phrases indicating a lack of effect or significance
New Auto-Interp
Negative Logits
orius
-0.65
minus
-0.65
FAR
-0.62
odds
-0.62
HELP
-0.61
lessness
-0.60
inferred
-0.60
stood
-0.59
partName
-0.59
iens
-0.58
POSITIVE LOGITS
DonaldTrump
0.76
upt
0.69
sylv
0.69
kees
0.64
circulate
0.64
ailing
0.63
atown
0.63
asionally
0.62
ntil
0.61
urther
0.60
Activations Density 0.261%