INDEX
Explanations
differences and discrepancies in descriptions, potentially related to product reviews
New Auto-Interp
Negative Logits
nan
-0.72
etts
-0.71
lish
-0.71
oire
-0.68
vance
-0.68
naire
-0.68
uto
-0.67
onds
-0.67
éĹĺ
-0.66
kie
-0.66
POSITIVE LOGITS
beware
1.05
alas
0.99
unfortunately
0.98
downside
0.93
lacks
0.88
hindered
0.88
drawbacks
0.84
lacked
0.80
hampered
0.79
lacking
0.78
Activations Density 0.360%