INDEX
Explanations
phrases indicating quantities or comparisons
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.11
3:0.06
4:0.23
5:0.03
6:0.14
7:0.14
8:0.03
9:0.04
10:0.07
11:0.07
Negative Logits
accessed
-1.42
Units
-1.35
Attacks
-1.32
Females
-1.31
Males
-1.31
assembled
-1.29
ALSO
-1.28
guards
-1.27
Cells
-1.25
rium
-1.23
POSITIVE LOGITS
convinc
1.65
enough
1.54
henko
1.54
ensical
1.52
BuyableInstoreAndOnline
1.51
ibaba
1.44
successfully
1.44
agra
1.40
ingen
1.39
satisf
1.38
Activations Density 0.011%