INDEX
Explanations
factual information related to diverse topics and contexts
New Auto-Interp
Head Attr Weights
0:0.06
1:0.02
2:0.12
3:0.05
4:0.05
5:0.06
6:0.02
7:0.09
8:0.16
9:0.06
10:0.18
11:0.09
Negative Logits
uthor
-0.88
swer
-0.85
byn
-0.84
writers
-0.83
emerge
-0.82
released
-0.80
witch
-0.80
hops
-0.77
swick
-0.77
ufact
-0.77
POSITIVE LOGITS
MpServer
1.00
vec
0.99
�
0.95
osexual
0.88
ジ
0.86
leptin
0.85
phenotype
0.85
Appearances
0.85
rhetorical
0.82
mic
0.82
Activations Density 0.555%