INDEX
Explanations
expressions of apology or regret
New Auto-Interp
Negative Logits
alo
-0.15
ets
-0.15
possibilities
-0.15
/Dk
-0.14
šet
-0.14
jeopardy
-0.14
irk
-0.14
imary
-0.14
ä¿
-0.13
ini
-0.13
POSITIVE LOGITS
/not
0.17
kus
0.17
ably
0.16
couldn
0.15
meant
0.15
bout
0.15
inconvenience
0.15
isser
0.15
ablish
0.15
couldn
0.14
Activations Density 0.030%