INDEX
    Explanations

    instances of the word "won."

    New Auto-Interp
    Negative Logits
    cia
    -0.15
    bos
    -0.14
     Kral
    -0.14
    ather
    -0.14
    bian
    -0.14
    å±ŀ
    -0.14
    yll
    -0.14
    thon
    -0.13
    cki
    -0.13
    uce
    -0.13
    POSITIVE LOGITS
    nable
    0.23
     hearts
    0.20
    now
    0.20
     battles
    0.18
    ipeg
    0.17
    REP
    0.16
    -win
    0.15
    atur
    0.14
    emem
    0.14
    ê¶Į
    0.14
    Act Density 0.036%

    No Known Activations