INDEX
Explanations
actions related to scoring and winning in sports contexts
New Auto-Interp
Negative Logits
ats
-0.18
atsu
-0.17
andom
-0.16
_stuff
-0.15
ATS
-0.15
standen
-0.15
canc
-0.14
.scalablytyped
-0.14
erti
-0.14
anta
-0.14
POSITIVE LOGITS
maximum
0.25
hon
0.25
brag
0.24
precious
0.23
valuable
0.22
silver
0.21
pole
0.21
points
0.20
Maximum
0.20
pride
0.19
Activations Density 0.052%