INDEX
    Explanations

    expressions of assurance, intention, and conditions related to events or decisions

    New Auto-Interp
    Negative Logits
    tdown
    -0.16
    urette
    -0.16
    æ¬ł
    -0.15
    enÃŃ
    -0.15
    illez
    -0.15
    uhan
    -0.15
    омен
    -0.15
     UNKNOWN
    -0.14
    Photon
    -0.14
    isay
    -0.14
    POSITIVE LOGITS
     won
    1.00
    won
    0.90
     Won
    0.89
    Won
    0.82
     WON
    0.60
     wont
    0.59
     wouldn
    0.56
    ä¸įä¼ļ
    0.50
     Wouldn
    0.43
     unlikely
    0.43
    Act Density 0.372%

    No Known Activations