INDEX
Explanations
expressions of emotional states, particularly those related to feeling, guilt, or reflection on personal experiences
New Auto-Interp
Negative Logits
uzzi
-0.15
onte
-0.15
abo
-0.15
feito
-0.15
icare
-0.14
agen
-0.14
lew
-0.14
icc
-0.14
usterity
-0.14
åģ¥
-0.13
POSITIVE LOGITS
like
0.27
compelled
0.26
strongly
0.23
obligated
0.22
obliged
0.22
như
0.21
differently
0.20
duty
0.20
comfortable
0.19
sorry
0.19
Activations Density 0.044%