INDEX
Explanations
concepts of personal responsibility and blame
New Auto-Interp
Head Attr Weights
0:0.10
1:0.02
2:0.32
3:0.09
4:0.04
5:0.10
6:0.04
7:0.05
8:0.04
9:0.06
10:0.06
11:0.04
Negative Logits
�醒
-2.85
Sear
-2.66
uberty
-2.65
Exp
-2.58
Expand
-2.48
Vis
-2.40
↑
-2.37
uca
-2.33
Rounds
-2.33
Scan
-2.33
POSITIVE LOGITS
blame
6.98
blaming
6.01
blamed
5.76
blames
5.65
culp
5.29
fault
5.16
scapego
5.11
negligence
4.55
responsibility
4.43
Fault
4.31
Activations Density 0.128%