INDEX
Explanations
themes centered around love and relationships
New Auto-Interp
Negative Logits
ootball
-0.15
egers
-0.15
elsen
-0.15
λαν
-0.15
warz
-0.15
uman
-0.15
swith
-0.14
esser
-0.14
ural
-0.14
wner
-0.14
POSITIVE LOGITS
fully
0.17
be
0.16
full
0.15
rug
0.15
ÙģÙĦ
0.15
ably
0.15
-kind
0.14
joy
0.14
kind
0.14
Sala
0.14
Activations Density 0.067%