INDEX
Explanations
hyperlinks or URLs in the text
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.14
3:0.07
4:0.10
5:0.02
6:0.06
7:0.21
8:0.02
9:0.04
10:0.18
11:0.08
Negative Logits
perm
-1.92
pex
-1.77
ciation
-1.55
FTWARE
-1.54
elist
-1.52
cific
-1.50
rators
-1.46
thood
-1.45
utor
-1.41
ptions
-1.40
POSITIVE LOGITS
spotlight
1.51
Bust
1.46
Topic
1.33
charism
1.31
黒
1.31
Iss
1.29
boobs
1.29
Bloody
1.28
Benedict
1.28
Rachel
1.28
Activations Density 0.001%