INDEX
Explanations
instances of blame and lack of accountability
New Auto-Interp
Negative Logits
icari
-0.17
اÙģÙĤ
-0.14
iband
-0.14
entai
-0.14
renc
-0.14
kå
-0.14
_TI
-0.14
èĻij
-0.14
pragma
-0.14
ajs
-0.14
POSITIVE LOGITS
blame
0.58
blaming
0.46
blames
0.46
blamed
0.44
responsibility
0.37
fault
0.36
责任
0.33
責
0.32
Responsibility
0.31
Respons
0.29
Activations Density 0.240%