INDEX
    Explanations

    conversational exchanges that reflect opinions and thoughts

    New Auto-Interp
    Negative Logits
     simply
    -0.17
     my
    -0.16
     only
    -0.16
    :
    -0.16
     simple
    -0.15
     will
    -0.14
     should
    -0.14
     cannot
    -0.14
    uck
    -0.14
     the
    -0.13
    POSITIVE LOGITS
     yourselves
    0.33
     yourself
    0.30
    your
    0.24
    ä½łçļĦ
    0.24
     youre
    0.23
     Yourself
    0.22
     your
    0.22
    YOUR
    0.20
    )?↵
    0.20
     ваÑĪ
    0.19
    Act Density 0.253%

    No Known Activations