INDEX
Explanations
themes of self-worth and the impact of love on self-perception
New Auto-Interp
Negative Logits
ivec
-0.15
illis
-0.15
igin
-0.15
éĻ
-0.14
URA
-0.14
versations
-0.14
occo
-0.14
ëĿ
-0.13
.bias
-0.13
swers
-0.13
POSITIVE LOGITS
worth
0.22
approval
0.21
Approval
0.20
Worth
0.20
approval
0.20
worth
0.19
inade
0.18
adequ
0.18
inferior
0.18
Approval
0.18
Activations Density 0.151%