INDEX
    Explanations

    avoiding problematic and offensive content

    New Auto-Interp
    Negative Logits
    segmented
    0.40
     periodic
    0.37
     ionized
    0.37
     informational
    0.37
     unstructured
    0.37
     ਜਾਣ
    0.37
    inputStream
    0.36
    clid
    0.36
     😎
    0.36
     persistently
    0.36
    POSITIVE LOGITS
     problematic
    0.70
     feminist
    0.68
     problemat
    0.68
     misog
    0.68
     feminists
    0.65
     racist
    0.63
     Feminist
    0.62
     antisemit
    0.59
     sensibilidad
    0.57
     LGBT
    0.57
    Act Density 0.406%

    No Known Activations