INDEX
Explanations
expressions of care and support in interpersonal interactions
New Auto-Interp
Negative Logits
(“
-0.29
”
-0.24
”,
-0.22
”),
-0.22
”
-0.22
”.
-0.21
”).
-0.21
“
-0.20
=”
-0.20
“,
-0.20
POSITIVE LOGITS
."↵
0.24
."↵↵
0.21
!"↵
0.19
()"↵
0.19
."]↵
0.18
.)↵
0.18
?"↵
0.17
!"↵↵
0.17
.)↵↵
0.17
."↵↵↵
0.16
Activations Density 0.503%