INDEX
Explanations
apologies or expressions of regret
instances of apology or expressions of regret
New Auto-Interp
Negative Logits
Ranked
-0.79
tnc
-0.74
iltration
-0.71
arnaev
-0.71
irrel
-0.69
eele
-0.68
edience
-0.68
rouse
-0.68
psey
-0.67
minecraft
-0.67
POSITIVE LOGITS
sorry
1.08
excuse
0.92
GES
0.85
sorry
0.85
Sorry
0.80
Sorry
0.74
giving
0.73
apologies
0.72
tm
0.72
Customers
0.69
Activations Density 0.014%