INDEX
Explanations
phrases related to guiding the reader's attention or providing warnings
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.06
3:0.17
4:0.10
5:0.05
6:0.14
7:0.04
8:0.06
9:0.09
10:0.09
11:0.10
Negative Logits
buckets
-1.55
TTL
-1.42
Titanic
-1.40
luster
-1.36
Genie
-1.29
Bounty
-1.28
garbage
-1.27
bucket
-1.27
Boo
-1.26
Bucket
-1.25
POSITIVE LOGITS
llor
1.44
cent
1.40
ghai
1.40
doi
1.33
]-
1.32
ndra
1.31
iven
1.31
np
1.31
ior
1.28
ivot
1.26
Activations Density 0.075%