INDEX
    Explanations

    tokens that occur in user requests addressing the assistant—especially second-person possessive/imperative phrasing like "your" or "give me".

    New Auto-Interp
    Negative Logits
     стор
    -0.07
     ملي
    -0.07
    قيق
    -0.07
    -0.06
    VV
    -0.06
     circulating
    -0.06
    ()));
    ↵
    -0.06
    قيقة
    -0.06
    /dis
    -0.06
    _CAR
    -0.06
    POSITIVE LOGITS
    -Semitism
    0.07
     jejich
    0.07
    forman
    0.07
     FontStyle
    0.07
     eiusmod
    0.07
     quarterly
    0.07
    toBeFalsy
    0.06
    -flat
    0.06
    .changed
    0.06
    оки
    0.06
    Act Density 0.036%

    No Known Activations