INDEX
Explanations
expressions of regret or apologies
New Auto-Interp
Negative Logits
fman
-0.88
kefeller
-0.85
uminati
-0.77
apon
-0.71
ament
-0.71
ossession
-0.70
indal
-0.69
conservancy
-0.68
ificent
-0.68
ificantly
-0.67
POSITIVE LOGITS
Sorry
1.01
sorry
0.92
Sorry
0.89
Invalid
0.87
Failed
0.86
sorry
0.84
unsupported
0.80
mistaken
0.77
miscar
0.77
:(
0.77
Activations Density 0.040%