INDEX
Explanations
phrases expressing feelings of reward, satisfaction, and emotional responses to experiences
New Auto-Interp
Negative Logits
legal
-0.64
Phill
-0.62
Newsletter
-0.60
Bridge
-0.59
Writer
-0.59
Roy
-0.58
Ashton
-0.58
hawks
-0.58
merger
-0.57
umar
-0.56
POSITIVE LOGITS
yourself
1.16
yourselves
0.98
Yourself
0.83
wasting
0.76
temptation
0.74
your
0.74
wondering
0.73
ãĤ¦ãĤ¹
0.72
wiser
0.71
forgiven
0.70
Activations Density 0.264%