INDEX
Explanations
phrases indicating strong positive appreciation or approval
expressions of significant appreciation or benefit
New Auto-Interp
Negative Logits
esome
-0.71
Notting
-0.71
aple
-0.70
ulia
-0.70
smoking
-0.69
Tags
-0.69
nar
-0.66
ties
-0.66
onomy
-0.65
tein
-0.65
POSITIVE LOGITS
appreciated
0.94
benefited
0.94
appreci
0.85
appre
0.84
amounts
0.79
depends
0.77
appreciate
0.76
impacting
0.75
benef
0.73
effected
0.73
Activations Density 0.017%