INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sag
    -0.07
    رغ
    -0.07
    userName
    -0.06
     الكثير
    -0.06
    ;↵↵↵
    -0.06
    😒
    -0.06
    .*;↵↵
    -0.06
     walking
    -0.06
     pregnant
    -0.06
     ()=>{↵
    -0.06
    POSITIVE LOGITS
    0.08
     bietet
    0.07
     matched
    0.07
     Informationen
    0.07
    guards
    0.07
    0.07
     END
    0.07
     The
    0.07
     faithfully
    0.07
    buttons
    0.06
    Act Density 0.018%

    No Known Activations