INDEX
Explanations
positive adjectives or phrases indicating benefits or advantages
phrases indicating positive outcomes or benefits
New Auto-Interp
Negative Logits
iper
-0.73
Downloadha
-0.72
eters
-0.70
racuse
-0.68
opers
-0.67
pter
-0.66
illon
-0.65
Bow
-0.64
asket
-0.63
hyde
-0.63
POSITIVE LOGITS
enough
1.16
enough
1.08
Enough
0.83
bye
0.82
nat
0.80
karma
0.77
optics
0.72
additions
0.70
news
0.70
surpr
0.70
Activations Density 0.124%