INDEX
Explanations
instances of the word "sorry" and variations in expressions of regret
New Auto-Interp
Negative Logits
ets
-0.17
erne
-0.16
alo
-0.16
847
-0.15
Alb
-0.14
maya
-0.14
elda
-0.13
arra
-0.13
ery
-0.13
possibilities
-0.13
POSITIVE LOGITS
apat
0.18
kus
0.17
æĻ´
0.16
isser
0.15
LATED
0.15
about
0.15
omba
0.15
sorry
0.15
ylon
0.14
achel
0.14
Activations Density 0.013%