INDEX
Explanations
instances of the word "afternoon" and other time-related phrases
New Auto-Interp
Negative Logits
Dawn
-0.14
.ov
-0.14
irling
-0.14
ellite
-0.14
anel
-0.14
kker
-0.14
opher
-0.13
leted
-0.13
cut
-0.13
/generated
-0.13
POSITIVE LOGITS
iw
0.17
lies
0.15
/e
0.15
edu
0.15
als
0.14
ibaba
0.14
arden
0.14
_reserve
0.14
azar
0.14
.life
0.14
Activations Density 0.014%