INDEX
Explanations
words and phrases related to apologies and admitting mistakes
New Auto-Interp
Negative Logits
arga
-0.16
908
-0.15
IPP
-0.14
ожеÑĤ
-0.14
ahlen
-0.14
¤íĶĦ
-0.14
Alarm
-0.14
é¼ĵ
-0.14
Reuse
-0.13
.spy
-0.13
POSITIVE LOGITS
apology
0.58
apologies
0.54
apolog
0.51
apologize
0.50
apologized
0.49
apologise
0.46
Ap
0.44
sorry
0.42
Ap
0.40
remorse
0.38
Activations Density 0.396%