INDEX
    Explanations

    questions and responses from a structured conversation

    question formats and references to inquiries or prompts

    New Auto-Interp
    Negative Logits
     revel
    -0.70
    angering
    -0.69
     outweigh
    -0.69
     bloom
    -0.68
     doomed
    -0.68
    utton
    -0.68
    aden
    -0.67
     overshadow
    -0.66
     stakes
    -0.66
     culmin
    -0.65
    POSITIVE LOGITS
    Hello
    1.35
     Hi
    1.30
    Hi
    1.28
     Hello
    1.25
    reetings
    1.14
     Hey
    1.08
    Question
    1.06
    Dear
    1.05
    Hey
    1.03
     Okay
    1.02
    Act Density 0.138%

    No Known Activations