INDEX
Explanations
abbreviations and acronyms
New Auto-Interp
Head Attr Weights
0:0.03
1:0.02
2:0.06
3:0.04
4:0.04
5:0.03
6:0.50
7:0.04
8:0.04
9:0.06
10:0.06
11:0.04
Negative Logits
Cheong
-1.20
Gry
-1.16
hesitation
-1.14
Ary
-1.10
itch
-1.09
Yanuk
-1.08
Vox
-1.08
Whites
-1.08
ndra
-1.08
unused
-1.08
POSITIVE LOGITS
illi
1.65
arios
1.43
sie
1.39
udo
1.34
addr
1.29
ciation
1.26
Cong
1.26
Powered
1.25
士
1.23
etary
1.22
Activations Density 0.005%