INDEX
    Explanations

    words related to the concept of 'win' and associations involving competition or ranking

    New Auto-Interp
    Negative Logits
    nya
    -0.31
    ma
    -0.31
    ro
    -0.27
    ness
    -0.26
    ne
    -0.26
    no
    -0.26
    me
    -0.26
    du
    -0.25
    ning
    -0.25
    pu
    -0.25
    POSITIVE LOGITS
    'nun
    0.30
    ’nun
    0.28
    gether
    0.23
    ffset
    0.21
    xygen
    0.21
    ymous
    0.21
    ceph
    0.20
    ject
    0.20
    herent
    0.19
    ptions
    0.19
    Act Density 0.467%

    No Known Activations