INDEX
Explanations
expressions of apology or regret
New Auto-Interp
Negative Logits
imonial
-0.16
migrationBuilder
-0.15
atables
-0.15
uye
-0.15
arga
-0.15
_PP
-0.15
urette
-0.15
egie
-0.14
entionPolicy
-0.14
eatures
-0.14
POSITIVE LOGITS
apologies
0.27
apologize
0.27
apologized
0.27
apology
0.26
apolog
0.26
Ap
0.26
ap
0.22
Ap
0.22
regrets
0.21
sorry
0.21
Activations Density 0.049%