INDEX
    Explanations

    phrases in a specific format, likely related to statements or quotes in a structured discussion

    instances of punctuation, particularly parentheses and commas

    New Auto-Interp
    Negative Logits
     .
    -0.66
     --
    -0.62
    !
    -0.61
     ,
    -0.59
     ("
    -0.58
     (
    -0.54
    .
    -0.49
    ---
    -0.48
    --
    -0.47
     and
    -0.47
    POSITIVE LOGITS
    ),"
    2.82
    )",
    2.80
    )."
    2.73
    )"
    2.71
    )</
    2.70
    )[
    2.51
    )=
    2.41
    )]
    2.40
    )/
    2.35
    )'
    2.33
    Act Density 0.013%

    No Known Activations