INDEX
    Explanations

    phrases related to product descriptions

    New Auto-Interp
    Negative Logits
     do
    -0.79
     try
    -0.79
     don
    -0.78
     can
    -0.77
     get
    -0.77
    ,
    -0.75
     й
    -0.75
    <bos>
    -0.73
     continue
    -0.73
     for
    -0.73
    POSITIVE LOGITS
     Wel
    2.51
    Wel
    2.27
     WEL
    2.16
     fta
    2.11
     fte
    2.10
     effe
    2.08
     ftu
    2.02
     §.
    1.98
     secon
    1.98
     applau
    1.97
    Act Density 0.139%

    No Known Activations