INDEX
Explanations
avoiding problematic and offensive content
New Auto-Interp
Negative Logits
segmented
0.40
periodic
0.37
ionized
0.37
informational
0.37
unstructured
0.37
ਜਾਣ
0.37
inputStream
0.36
clid
0.36
😎
0.36
persistently
0.36
POSITIVE LOGITS
problematic
0.70
feminist
0.68
problemat
0.68
misog
0.68
feminists
0.65
racist
0.63
Feminist
0.62
antisemit
0.59
sensibilidad
0.57
LGBT
0.57
Activations Density 0.406%