INDEX
    Explanations

    questions, tips, answers, and instructional prompts

    structured query and response formats in text

    New Auto-Interp
    Negative Logits
     disg
    -0.76
    cale
    -0.70
    ité
    -0.68
    undai
    -0.67
    aps
    -0.64
    creen
    -0.63
     indiscrim
    -0.63
     glac
    -0.62
    paces
    -0.61
    ides
    -0.60
    POSITIVE LOGITS
     #
    0.99
     Number
    0.98
     Yourself
    0.96
     Summary
    0.93
     Explan
    0.89
     Description
    0.86
    !:
    0.86
     Regarding
    0.85
     Abuse
    0.84
    :
    0.84
    Act Density 0.200%

    No Known Activations