INDEX
    Explanations

    differences and discrepancies in descriptions, potentially related to product reviews

    New Auto-Interp
    Negative Logits
    nan
    -0.72
    etts
    -0.71
    lish
    -0.71
    oire
    -0.68
    vance
    -0.68
    naire
    -0.68
    uto
    -0.67
    onds
    -0.67
    éĹĺ
    -0.66
    kie
    -0.66
    POSITIVE LOGITS
     beware
    1.05
     alas
    0.99
     unfortunately
    0.98
     downside
    0.93
     lacks
    0.88
     hindered
    0.88
     drawbacks
    0.84
     lacked
    0.80
     hampered
    0.79
     lacking
    0.78
    Act Density 0.360%

    No Known Activations