INDEX
    Explanations

    keeping prejudiced or insulting

    New Auto-Interp
    Negative Logits
     تە
    0.41
     CYAN
    0.39
     difíciles
    0.37
    ́t
    0.37
     প্রতিক
    0.36
    few
    0.36
    難しい
    0.36
     FCA
    0.36
     bot
    0.36
     baton
    0.36
    POSITIVE LOGITS
     Keep
    0.92
    Keep
    0.89
     keep
    0.89
     keeps
    0.84
    keep
    0.80
    保持
    0.77
     Keeps
    0.74
     KEEP
    0.74
     simplify
    0.72
     kept
    0.71
    Act Density 0.000%

    No Known Activations