INDEX
Explanations
words related to the concept of 'win' and associations involving competition or ranking
New Auto-Interp
Negative Logits
nya
-0.31
ma
-0.31
ro
-0.27
ness
-0.26
ne
-0.26
no
-0.26
me
-0.26
du
-0.25
ning
-0.25
pu
-0.25
POSITIVE LOGITS
'nun
0.30
’nun
0.28
gether
0.23
ffset
0.21
xygen
0.21
ymous
0.21
ceph
0.20
ject
0.20
herent
0.19
ptions
0.19
Activations Density 0.467%