INDEX
    Explanations

    regarding or concerning

    It detects tokens that mark or begin the model/assistant's responses—i.e., words frequently used at the start of assistant utterances.

    New Auto-Interp
    Negative Logits
    把它
    0.38
    ಣಿ
    0.34
     intuitive
    0.33
    quint
    0.33
     这是
    0.33
    ന്
    0.32
     ভেবে
    0.32
    𝖑
    0.32
     ultimate
    0.32
     bằng
    0.31
    POSITIVE LOGITS
    Regarding
    1.94
     Regarding
    1.91
     regarding
    1.88
    regarding
    1.68
    Concerning
    1.63
    至于
    1.61
     Concerning
    1.59
     concernant
    1.49
     щодо
    1.44
    至於
    1.44
    Act Density 0.010%

    No Known Activations