INDEX
Explanations
phrases related to offering apologies
expressions of apology
New Auto-Interp
Negative Logits
nesota
-0.77
insula
-0.71
iltration
-0.71
irrel
-0.70
eele
-0.68
cryptoc
-0.66
arnaev
-0.66
tnc
-0.66
Ranked
-0.66
infiltration
-0.64
POSITIVE LOGITS
sorry
0.86
faced
0.78
fully
0.75
GES
0.72
BLE
0.71
excuse
0.71
Guilty
0.69
sorry
0.66
giving
0.66
face
0.66
Activations Density 0.012%