INDEX
    Explanations

    It detects tokens that introduce questions—question-word tokens signaling queries.

    New Auto-Interp
    Negative Logits
     والإ
    -0.08
    -0.07
    Slave
    -0.07
     الإ
    -0.07
     dokonce
    -0.07
    segue
    -0.06
    -four
    -0.06
    الإ
    -0.06
     стил
    -0.06
    oles
    -0.06
    POSITIVE LOGITS
     what
    0.25
     What
    0.23
    What
    0.22
    what
    0.19
    “What
    0.19
    "What
    0.19
     WHAT
    0.19
    .What
    0.17
    .what
    0.14
    WHAT
    0.14
    Act Density 0.141%

    No Known Activations