INDEX
Explanations
emotion-related words, such as "excitement," "sadness," and "arrogance."
emotions and negative experiences
New Auto-Interp
Negative Logits
REE
-0.65
Activity
-0.64
ppel
-0.63
onal
-0.62
ãĥīãĥ©
-0.61
LAND
-0.61
Go
-0.61
asketball
-0.60
WD
-0.59
ĵ
-0.59
POSITIVE LOGITS
iest
1.25
inherent
1.11
iness
0.94
emanating
0.94
afforded
0.94
surrounding
0.93
plag
0.86
lessness
0.85
quot
0.83
aspect
0.82
Activations Density 0.296%