INDEX
Explanations
words or phrases expressing strong emotional evaluations like "egregious," "saddening," and "laughable."
words that indicate negative or distressing qualities and emphasize their significance
New Auto-Interp
Negative Logits
warr
-0.63
incorpor
-0.63
Nope
-0.61
comr
-0.61
invincible
-0.59
shenan
-0.59
hemor
-0.58
bye
-0.58
wills
-0.57
broom
-0.57
POSITIVE LOGITS
considering
1.12
given
1.06
given
1.04
because
0.97
because
0.87
when
0.87
insofar
0.86
owing
0.81
since
0.81
today
0.78
Activations Density 0.204%