INDEX
Explanations
phrases indicating a significant impact or improvement
phrases expressing a significant positive impact or benefit
New Auto-Interp
Negative Logits
esome
-0.76
Tags
-0.75
smoking
-0.73
arians
-0.70
nar
-0.69
ties
-0.68
yx
-0.67
Notting
-0.67
iris
-0.67
onomy
-0.67
POSITIVE LOGITS
appreciated
1.00
benefited
0.94
amounts
0.84
appreci
0.82
appreciate
0.77
appre
0.77
impacting
0.75
impacted
0.75
effected
0.75
simpl
0.74
Activations Density 0.016%