INDEX
Explanations
words and phrases related to apologies and expressions of regret
New Auto-Interp
Negative Logits
reb
-0.16
Sylv
-0.15
Decorator
-0.14
strike
-0.14
Horn
-0.14
.experimental
-0.14
la
-0.14
Pis
-0.14
stab
-0.14
rib
-0.13
POSITIVE LOGITS
inkel
0.18
agram
0.16
amas
0.16
uger
0.15
óst
0.15
acman
0.14
imm
0.14
zers
0.14
pery
0.14
ucken
0.14
Activations Density 0.040%