INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     San
    -0.07
     Silence
    -0.07
    -0.07
    요일
    -0.07
     ';↵
    -0.06
    _machine
    -0.06
    소를
    -0.06
     bees
    -0.06
    ابعة
    -0.06
     EXEC
    -0.06
    POSITIVE LOGITS
     Femin
    0.07
    detect
    0.07
    [][
    0.07
    <|eot_id|>
    0.07
    _TRAN
    0.06
    kept
    0.06
    camatan
    0.06
     انتخاب
    0.06
    Literal
    0.06
    Focused
    0.06
    Act Density 0.002%

    No Known Activations