INDEX
    Explanations

    conversations involving expressions of regret or apologies

    New Auto-Interp
    Negative Logits
     (
    -0.93
     (&
    -0.81
     &
    -0.78
     [
    -0.75
    ).[
    -0.74
    )[
    -0.72
     ([
    -0.68
    )&
    -0.66
     “
    -0.65
    ')[
    -0.64
    POSITIVE LOGITS
    -"
    1.64
    —”
    1.61
    -”
    1.61
    --"
    1.52
    —"
    1.52
    -“
    1.44
    -",
    1.28
    ——”
    1.25
    …”
    1.20
    —“
    1.18
    Act Density 0.309%

    No Known Activations