INDEX
Explanations
promotions and features related to online content subscriptions and events
New Auto-Interp
Head Attr Weights
0:0.03
1:0.06
2:0.10
3:0.06
4:0.04
5:0.12
6:0.11
7:0.11
8:0.09
9:0.05
10:0.11
11:0.08
Negative Logits
nown
-1.49
ernel
-1.29
chwitz
-1.28
lyak
-1.26
alon
-1.26
ivas
-1.26
offic
-1.22
yang
-1.21
cific
-1.20
hement
-1.20
POSITIVE LOGITS
archive
1.50
curated
1.31
Featured
1.25
playlist
1.24
subscription
1.24
essays
1.18
Deadline
1.17
renaissance
1.17
bookmark
1.15
rebate
1.13
Activations Density 0.001%