INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sắt
    -0.07
     we
    -0.07
    -0.06
    шли
    -0.06
     السلام
    -0.06
    .Location
    -0.06
    —we
    -0.06
    Microsoft
    -0.06
    -0.06
    نى
    -0.06
    POSITIVE LOGITS
     curved
    0.08
    >↵↵
    0.07
     azalt
    0.07
     уб
    0.07
    ?“↵↵
    0.06
    _capture
    0.06
      ↵  ↵
    0.06
    !”↵↵
    0.06
    ा↵↵
    0.06
    _BITS
    0.06
    Act Density 0.024%

    No Known Activations