INDEX
Explanations
negative sentiment or ethical issues
New Auto-Interp
Negative Logits
collecte
0.55
사용하여
0.52
distributes
0.51
celebrates
0.50
ματο
0.50
pomoc
0.50
جمع
0.50
приклад
0.49
oluştur
0.49
confers
0.49
POSITIVE LOGITS
resentment
0.62
Didn
0.62
violating
0.62
wrongdoing
0.61
worsening
0.61
uneasy
0.60
नहीं
0.60
tyranny
0.59
undermining
0.59
دلیل
0.58
Activations Density 3.302%