INDEX
    Explanations

    user queries that directly address the assistant with second-person phrasing (especially “you”), often in “Do/Can you …?” requests.

    New Auto-Interp
    Negative Logits
    -0.08
     Polygon
    -0.07
     loại
    -0.07
    -0.07
     additional
    -0.07
     וכמובן
    -0.07
    .itemId
    -0.07
     Scandin
    -0.07
    -0.07
     словам
    -0.07
    POSITIVE LOGITS
    0.07
    حة
    0.07
     créé
    0.07
    時点
    0.07
    口号
    0.07
    诊治
    0.07
    by
    0.07
    birth
    0.06
     bất
    0.06
     Reads
    0.06
    Act Density 0.041%

    No Known Activations