INDEX
Explanations
phrases discussing significant news or stories
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.16
3:0.11
4:0.08
5:0.03
6:0.13
7:0.08
8:0.08
9:0.05
10:0.13
11:0.07
Negative Logits
ּ
-1.66
FTWARE
-1.56
monog
-1.45
Sigma
-1.40
Alley
-1.39
lyak
-1.34
WAY
-1.32
ibly
-1.31
Velvet
-1.30
dit
-1.28
POSITIVE LOGITS
IMAGES
1.49
hani
1.48
theme
1.40
bleacher
1.36
relevant
1.31
hr
1.27
illuminate
1.26
});
1.23
consume
1.23
pulse
1.22
Activations Density 0.001%