INDEX
Explanations
expressions of remorse and requests for forgiveness
Expressions of apology and regret
apologizing and remorse
New Auto-Interp
Negative Logits
onlook
-0.53
iVar
-0.53
jealous
-0.52
"','
-0.52
envy
-0.51
Datuak
-0.51
riv
-0.51
mola
-0.51
empowerment
-0.50
chenkt
-0.50
POSITIVE LOGITS
apologized
1.10
apology
1.08
remorse
1.05
apologizing
1.03
apologize
0.98
apologised
0.94
apologies
0.89
apologe
0.83
apologise
0.83
apolog
0.78
Activations Density 0.328%