INDEX
Explanations
references to honor and recognition
New Auto-Interp
Head Attr Weights
0:0.05
1:0.08
2:0.05
3:0.05
4:0.09
5:0.04
6:0.09
7:0.27
8:0.03
9:0.05
10:0.06
11:0.09
Negative Logits
François
-2.30
arrell
-2.13
Ré
-2.05
Trends
-2.04
tails
-2.01
Dele
-2.00
Coleman
-1.98
refere
-1.96
Tanner
-1.93
itaire
-1.91
POSITIVE LOGITS
PB
2.35
honor
2.23
Xia
2.21
ox
2.17
ebus
2.14
pistol
2.04
CDC
2.01
ocaust
2.01
thia
2.01
bee
1.99
Activations Density 0.003%