INDEX
Explanations
references to historical figures and their accomplishments
New Auto-Interp
Head Attr Weights
0:0.03
1:0.50
2:0.03
3:0.05
4:0.03
5:0.07
6:0.04
7:0.03
8:0.03
9:0.05
10:0.04
11:0.04
Negative Logits
acs
-2.87
someday
-2.54
アル
-2.53
Sloven
-2.46
option
-2.42
achievable
-2.35
Cooke
-2.34
isson
-2.33
Courage
-2.27
Sao
-2.26
POSITIVE LOGITS
17
4.45
17
4.39
18
3.44
18
3.44
eteenth
3.38
eighteenth
3.27
1700
3.20
16
3.13
eteen
3.13
uko
3.12
Activations Density 0.008%