INDEX
Explanations
apologies or laments
instances of the word "sorry."
New Auto-Interp
Negative Logits
helicop
-0.69
ccording
-0.68
rower
-0.67
krit
-0.67
lav
-0.66
holistic
-0.65
tein
-0.65
minecraft
-0.64
vet
-0.64
ief
-0.64
POSITIVE LOGITS
sorry
1.06
Sorry
0.91
sorry
0.87
Sorry
0.85
excuse
0.82
pardon
0.79
GES
0.75
Guys
0.72
Ladies
0.70
apologies
0.69
Activations Density 0.009%