INDEX
Explanations
words relating to the concept of pleasure or satisfaction
words related to pleasure or things that bring enjoyment
New Auto-Interp
Negative Logits
DERR
-0.81
master
-0.80
lished
-0.79
worthiness
-0.76
masters
-0.75
urst
-0.74
ilities
-0.70
hips
-0.69
rait
-0.68
ility
-0.68
POSITIVE LOGITS
asure
1.10
asers
1.07
asures
1.01
bian
0.97
ases
0.94
vere
0.89
asing
0.87
conom
0.86
fter
0.84
ptic
0.84
Activations Density 0.038%