INDEX
Explanations
references to scientific phenomena and experimental setups
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.24
3:0.09
4:0.09
5:0.03
6:0.03
7:0.09
8:0.06
9:0.04
10:0.16
11:0.09
Negative Logits
sqor
-1.83
adiq
-1.64
reci
-1.58
uer
-1.57
akings
-1.55
initions
-1.54
roy
-1.51
appings
-1.50
20439
-1.47
aldi
-1.47
POSITIVE LOGITS
presided
1.69
consisting
1.61
overseen
1.60
supplemented
1.59
confines
1.57
reminiscent
1.53
Called
1.52
styled
1.51
livious
1.46
attic
1.44
Activations Density 0.318%