INDEX
Explanations
questions and expressions of surprise
expressions of surprise or disbelief
New Auto-Interp
Negative Logits
diffusion
-0.72
ezvous
-0.71
preference
-0.64
coasts
-0.59
idem
-0.58
favors
-0.58
Nieto
-0.58
mutually
-0.58
aturday
-0.56
liber
-0.55
POSITIVE LOGITS
?!
1.09
?!"
1.05
?)
1.00
Huh
1.00
???
0.99
?".
0.97
!?"
0.96
Why
0.96
?).
0.95
!?
0.94
Activations Density 0.446%