INDEX
Explanations
capital letters or initialisms
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.13
3:0.04
4:0.03
5:0.05
6:0.10
7:0.05
8:0.03
9:0.04
10:0.29
11:0.05
Negative Logits
Stronghold
-2.46
Wonders
-2.43
Soy
-2.39
Unt
-2.39
Tant
-2.39
Aster
-2.36
Secrets
-2.31
Misty
-2.25
gent
-2.23
Priv
-2.23
POSITIVE LOGITS
HL
4.08
HL
2.92
hal
2.86
igl
2.64
emort
2.60
NHL
2.53
iverpool
2.50
jri
2.47
onga
2.38
LET
2.36
Activations Density 0.000%