INDEX
    Explanations

    instances of the word "assistant" and related variations

    New Auto-Interp
    Negative Logits
     itſelf
    -1.06
     purpoſe
    -1.04
     poffible
    -0.99
    ſelf
    -0.99
     greateſt
    -0.99
     ―――――
    -0.97
     pleaſure
    -0.97
     uſ
    -0.97
     reaſon
    -0.96
     Diſ
    -0.94
    POSITIVE LOGITS
     hire
    0.82
     Hire
    0.68
     Dog
    0.64
    0.61
    ↵↵↵
    0.57
     Do
    0.56
     stars
    0.56
     |
    0.56
     include
    0.55
     '
    0.54
    Act Density 0.091%

    No Known Activations