INDEX
Explanations
sensitive topics and issues
New Auto-Interp
Negative Logits
Entropy
0.43
malos
0.43
friendly
0.41
useless
0.41
autores
0.39
friendly
0.38
painters
0.38
purpose
0.37
machining
0.37
बदमाशों
0.37
POSITIVE LOGITS
sensitive
1.40
Sensitive
1.20
Sensitive
1.20
controversial
1.20
敏感
1.16
sensitive
1.15
sensitively
1.09
حساس
1.09
topic
1.08
delicate
1.07
Activations Density 0.019%