INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     legal
    -0.07
     Malays
    -0.07
    ----------------------------
    -0.07
     checked
    -0.06
     freund
    -0.06
    Celebr
    -0.06
    _REFRESH
    -0.06
     males
    -0.06
     empres
    -0.06
    'on
    -0.06
    POSITIVE LOGITS
    }
    0.10
    }\
    0.09
    0.08
    .
    0.08
    }}
    0.08
    ism
    0.08
    َة
    0.07
    Tip
    0.07
    ibt
    0.07
    iteit
    0.07
    Act Density 0.021%

    No Known Activations