INDEX
Explanations
specific references to groups of people and actions related to accountability or consequence
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.17
3:0.12
4:0.08
5:0.05
6:0.18
7:0.03
8:0.06
9:0.09
10:0.07
11:0.05
Negative Logits
Polo
-1.38
Chase
-1.30
Catalyst
-1.26
[*
-1.20
Rockies
-1.17
ARP
-1.15
scanning
-1.15
Cros
-1.14
Haas
-1.14
ADRA
-1.12
POSITIVE LOGITS
"]=>
1.76
basketball
1.56
sqor
1.46
BuyableInstoreAndOnline
1.43
david
1.42
usa
1.38
};
1.38
malink
1.36
english
1.36
andre
1.32
Activations Density 0.016%