INDEX
    Explanations

    instruction

    phrases in prompt headers that explicitly signal task directives or instructions to follow.

    New Auto-Interp
    Negative Logits
     স্বাস্থ্য
    -0.08
    ỏng
    -0.08
    PID
    -0.08
     Rolling
    -0.08
     Hanging
    -0.08
     MGM
    -0.08
     Appetite
    -0.07
     PID
    -0.07
     Nights
    -0.07
     Männer
    -0.07
    POSITIVE LOGITS
    /em
    0.09
     espada
    0.08
    _fonts
    0.08
     tomando
    0.08
     multimedia
    0.08
    iglio
    0.08
    Font
    0.08
    font
    0.08
     sur
    0.08
     tomar
    0.08
    Act Density 0.012%

    No Known Activations