INDEX
Explanations
statements indicating blame or responsibility
references to personal responsibility or blame
New Auto-Interp
Negative Logits
atform
-0.81
atos
-0.81
CHO
-0.80
ISTER
-0.80
iolet
-0.78
¥µ
-0.75
chin
-0.74
thia
-0.74
女
-0.74
apist
-0.73
POSITIVE LOGITS
lessly
1.05
forgiven
0.86
fault
0.83
Fault
0.81
faults
0.80
lessness
0.77
ously
0.76
less
0.71
Karma
0.68
line
0.68
Activations Density 0.019%