INDEX
    Explanations

    expressions of care and support in interpersonal interactions

    New Auto-Interp
    Negative Logits
     (“
    -0.29
    -0.24
    ”,
    -0.22
    ”),
    -0.22
    -0.22
    ”.
    -0.21
    ”).
    -0.21
    -0.20
    =”
    -0.20
    “,
    -0.20
    POSITIVE LOGITS
    ."↵
    0.24
    ."↵↵
    0.21
    !"↵
    0.19
    ()"↵
    0.19
    ."]↵
    0.18
    .)↵
    0.18
    ?"↵
    0.17
    !"↵↵
    0.17
    .)↵↵
    0.17
    ."↵↵↵
    0.16
    Act Density 0.503%

    No Known Activations