INDEX
Explanations
denying responsibility or blame
New Auto-Interp
Negative Logits
confiar
0.44
தன்னை
0.44
potencialmente
0.41
โจทย์
0.40
emed
0.39
ধার
0.38
ponible
0.38
환경
0.38
বিবেচিত
0.38
potenciales
0.37
POSITIVE LOGITS
responsibility
1.00
wrongdoing
0.94
fault
0.88
blame
0.86
guilt
0.85
responsibility
0.82
involvement
0.80
liability
0.77
Responsibility
0.77
authorship
0.76
Activations Density 0.010%