INDEX
Explanations
phrases expressing apologies
expressions of apology
New Auto-Interp
Negative Logits
tein
-0.82
iltration
-0.78
irrel
-0.73
arnaev
-0.73
insula
-0.69
tnc
-0.67
infiltration
-0.67
helicop
-0.66
minecraft
-0.66
ccording
-0.65
POSITIVE LOGITS
sorry
1.00
faced
0.81
GES
0.80
sorry
0.78
fully
0.74
excuse
0.74
Guilty
0.71
Sorry
0.70
pardon
0.68
tm
0.67
Activations Density 0.008%