INDEX
    Explanations

    questions starting with how or what

    explicit user prompts or directives—imperatives and questions that instruct the assistant to perform a task or answer a query.

    New Auto-Interp
    Negative Logits
    azoned
    0.21
    !),
    0.20
     รวม
    0.20
     비롯
    0.20
     ഏറെ
    0.20
    国内外
    0.19
    étale
    0.19
    tochy
    0.19
     itulah
    0.19
     മറ്റു
    0.19
    POSITIVE LOGITS
    𝑥
    0.27
     I
    0.21
    0.21
     میکنم
    0.20
     violently
    0.20
    したい
    0.20
     pharmacy
    0.19
     F
    0.19
     potassium
    0.19
     resistor
    0.19
    Act Density 7.617%

    No Known Activations