INDEX
Explanations
proper nouns, specifically names and affiliations related to publications and authors
New Auto-Interp
Head Attr Weights
0:0.25
1:0.04
2:0.02
3:0.07
4:0.11
5:0.06
6:0.07
7:0.03
8:0.05
9:0.13
10:0.03
11:0.09
Negative Logits
etheless
-2.67
again
-2.26
theless
-2.13
Dave
-2.01
Again
-2.00
fortunately
-1.91
thankfully
-1.89
Mub
-1.89
aleb
-1.89
Josh
-1.89
POSITIVE LOGITS
paras
2.21
,
2.12
Encyclopedia
2.11
Retrieved
2.10
.,
1.90
specialty
1.88
.;
1.84
+++
1.79
-,
1.78
vol
1.76
Activations Density 0.001%