INDEX
Explanations
references to exploration and connection with others
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.13
3:0.18
4:0.05
5:0.03
6:0.07
7:0.08
8:0.08
9:0.06
10:0.16
11:0.07
Negative Logits
dated
-1.73
pex
-1.57
portion
-1.52
tenance
-1.47
aq
-1.46
ynt
-1.44
ynthesis
-1.42
conn
-1.38
epad
-1.36
gery
-1.35
POSITIVE LOGITS
Interstitial
1.75
ricanes
1.55
trending
1.55
taboola
1.49
Buccaneers
1.46
Literary
1.46
Kavanaugh
1.45
witty
1.45
Valiant
1.42
Feel
1.41
Activations Density 0.001%