INDEX
Explanations
words or phrases indicating personal thoughts, opinions, or emotions
expressions of personal feelings or emotions
New Auto-Interp
Negative Logits
umbn
-0.68
adish
-0.62
andise
-0.62
fig
-0.61
ciating
-0.61
esan
-0.59
enta
-0.59
oys
-0.59
etting
-0.58
uning
-0.58
POSITIVE LOGITS
compelled
1.23
comfortable
1.16
obligated
1.10
uneasy
1.10
strongly
1.08
confident
1.08
obliged
1.01
betrayed
1.00
passionately
1.00
ashamed
0.96
Activations Density 0.043%