INDEX
    Explanations

    instances of conversational prompts or questions

    New Auto-Interp
    Negative Logits
    ojis
    -0.16
    eniable
    -0.15
    ön
    -0.15
    uhn
    -0.15
    uve
    -0.14
    Sphere
    -0.14
     oto
    -0.13
    ubi
    -0.13
    Sadly
    -0.13
    İY
    -0.13
    POSITIVE LOGITS
     fear
    0.57
     Fear
    0.51
    Fear
    0.50
     fret
    0.48
     worry
    0.46
     don
    0.46
    don
    0.41
     Don
    0.39
    Don
    0.38
     worries
    0.38
    Act Density 0.125%

    No Known Activations