INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    lobal
    -0.08
    undo
    -0.07
    �i
    -0.07
    ,...↵↵
    -0.07
     ı
    -0.06
    _flip
    -0.06
    센터
    -0.06
     IconButton
    -0.06
    Windows
    -0.06
     -------↵
    -0.06
    POSITIVE LOGITS
     gender
    0.09
     гум
    0.07
     Goldman
    0.06
    EMALE
    0.06
    .FormattingEnabled
    0.06
     Gender
    0.06
    (cls
    0.06
    вами
    0.06
    SSL
    0.06
    ених
    0.06
    Act Density 0.004%

    No Known Activations