INDEX
Explanations
phrases indicating desire, expectations, and social relationships
New Auto-Interp
Negative Logits
azes
-0.14
relay
-0.14
sami
-0.14
ì¿
-0.14
antics
-0.14
rell
-0.14
anson
-0.14
elastic
-0.13
ongan
-0.13
rap
-0.13
POSITIVE LOGITS
happiness
0.19
succeed
0.17
receive
0.17
receive
0.16
success
0.16
.idea
0.16
welfare
0.16
receives
0.15
satisfaction
0.15
perience
0.15
Activations Density 0.205%