INDEX
    Explanations

    expressions of warmth and friendliness

    New Auto-Interp
    Negative Logits
    acro
    -0.16
    ivic
    -0.16
    oman
    -0.15
    /do
    -0.15
    ocket
    -0.15
    horn
    -0.14
    904
    -0.14
    jed
    -0.14
    lor
    -0.14
    arb
    -0.14
    POSITIVE LOGITS
    elter
    0.17
    ASCADE
    0.16
    illac
    0.15
    lok
    0.15
    erton
    0.14
    elry
    0.14
    argo
    0.14
    elsey
    0.14
    lier
    0.14
    inkel
    0.14
    Act Density 0.016%

    No Known Activations