INDEX
Explanations
apologies or expressions of regret
instances of the word "sorry" or variations of it
New Auto-Interp
Negative Logits
tnc
-0.74
rouse
-0.71
krit
-0.68
arnaev
-0.67
irrel
-0.67
eele
-0.67
tein
-0.67
Ranked
-0.66
minecraft
-0.64
helicop
-0.64
POSITIVE LOGITS
sorry
1.11
excuse
0.89
Sorry
0.87
sorry
0.86
Sorry
0.84
GES
0.83
giving
0.78
apologies
0.75
guys
0.74
pardon
0.71
Activations Density 0.012%