INDEX
Explanations
words related to assigning blame or responsibility
instances of the word "blame"
New Auto-Interp
Negative Logits
ires
-0.75
chan
-0.74
raised
-0.70
artney
-0.70
ouver
-0.69
irl
-0.69
opened
-0.68
forms
-0.67
iring
-0.67
quire
-0.65
POSITIVE LOGITS
blame
1.38
blames
0.95
blaming
0.94
attribut
0.88
blamed
0.87
forgiven
0.80
forgiveness
0.80
culp
0.78
culprit
0.78
burden
0.71
Activations Density 0.006%