INDEX
    Explanations

    words related to winning and victory

    New Auto-Interp
    Negative Logits
    yll
    -0.17
    iagnostics
    -0.16
    illon
    -0.16
    cia
    -0.16
    eb
    -0.16
    illo
    -0.15
    ASON
    -0.15
    bian
    -0.15
    wine
    -0.15
    wald
    -0.15
    POSITIVE LOGITS
    nable
    0.31
    -win
    0.23
    ning
    0.23
    now
    0.23
    throp
    0.21
    ograd
    0.20
    ery
    0.20
    ners
    0.19
    -loss
    0.19
    try
    0.18
    Act Density 0.050%

    No Known Activations