INDEX
Explanations
themes related to contrasting experiences of joy and despair
New Auto-Interp
Negative Logits
illon
-0.15
aight
-0.15
480
-0.14
arton
-0.14
ieval
-0.14
.$.
-0.14
itur
-0.14
yles
-0.14
alon
-0.14
mates
-0.14
POSITIVE LOGITS
èά
0.23
ous
0.21
like
0.18
/ext
0.18
-like
0.17
uous
0.16
/exp
0.15
ously
0.15
proportions
0.15
istic
0.15
Activations Density 0.272%