INDEX
Explanations
words related to emotional states, particularly negative emotions like awkwardness, awfulness, and awakeness
expressions of emotions, particularly those conveying surprise or amazement
New Auto-Interp
Negative Logits
Townsend
-0.72
cision
-0.71
iod
-0.64
Feld
-0.64
Luxem
-0.62
Missile
-0.60
Nadu
-0.60
idated
-0.59
ogue
-0.59
leaflets
-0.58
POSITIVE LOGITS
kward
1.46
akening
1.39
akens
1.36
esome
1.19
dry
1.05
aii
1.04
orld
1.00
aken
1.00
apon
0.98
riter
0.95
Activations Density 0.024%