INDEX
Explanations
words related to health and well-being
terms related to beneficial or detrimental effects
New Auto-Interp
Negative Logits
buck
-0.82
oute
-0.72
pler
-0.64
metal
-0.63
ascus
-0.62
hyde
-0.62
rine
-0.61
bish
-0.61
herer
-0.60
Tube
-0.60
POSITIVE LOGITS
outweigh
0.77
icial
0.77
outcomes
0.73
effects
0.72
synerg
0.72
influence
0.71
nerg
0.70
influences
0.69
effect
0.68
Advantage
0.68
Activations Density 0.079%