INDEX
Explanations
themes relating to activities and social interactions
New Auto-Interp
Negative Logits
churn
-0.18
sift
-0.16
uggle
-0.15
whipped
-0.15
jog
-0.15
PEND
-0.15
cultiv
-0.14
stalk
-0.14
indu
-0.14
sá»±
-0.14
POSITIVE LOGITS
enting
0.24
otyping
0.23
parenting
0.22
ancing
0.22
painting
0.22
lawy
0.22
coloring
0.22
singing
0.22
shopping
0.21
eating
0.21
Activations Density 0.511%