INDEX
Explanations
instances of the word "blame" and its variations
attributing blame
New Auto-Interp
Negative Logits
basic
-0.33
dall
-0.33
pherson
-0.32
пода
-0.32
basic
-0.32
Dougall
-0.32
roberto
-0.31
съ
-0.30
dell
-0.30
Sed
-0.30
POSITIVE LOGITS
Blame
0.84
blame
0.80
Blame
0.79
blame
0.75
worthiness
0.68
blaming
0.66
blamed
0.63
istoitu
0.63
autorytatywna
0.61
blames
0.60
Activations Density 0.006%