INDEX
Explanations
phrases related to claiming or asserting responsibility
New Auto-Interp
Negative Logits
omb
-0.68
Watching
-0.67
arrang
-0.67
ombo
-0.63
Selected
-0.63
schild
-0.60
onen
-0.59
gone
-0.59
Closed
-0.58
watching
-0.58
POSITIVE LOGITS
innocence
0.89
responsibility
0.82
ownership
0.77
asylum
0.77
credit
0.76
ignorance
0.75
respons
0.74
falsely
0.74
superiority
0.72
edly
0.71
Activations Density 0.053%