INDEX
Explanations
emotional expressions and feelings
New Auto-Interp
Negative Logits
иÑĢ
-0.15
cba
-0.14
ĸ
-0.14
/wiki
-0.13
(Optional
-0.13
uitar
-0.13
oris
-0.13
.struct
-0.13
efa
-0.13
umably
-0.13
POSITIVE LOGITS
strongly
0.28
compelled
0.28
like
0.24
duty
0.23
comfortable
0.22
obliged
0.21
Duty
0.20
như
0.19
obligated
0.19
Like
0.19
Activations Density 0.025%