INDEX
Explanations
phrases related to enjoyment and pleasure
references to enjoying various activities and experiences
New Auto-Interp
Negative Logits
ural
-0.74
pled
-0.65
raft
-0.64
dare
-0.64
ene
-0.64
ethical
-0.61
agnetic
-0.61
scrimmage
-0.61
carriers
-0.60
©¶æ
-0.59
POSITIVE LOGITS
nels
0.87
ably
0.86
joy
0.85
enjoys
0.84
lihood
0.84
enjoyment
0.80
ĸļ
0.79
enjoyed
0.79
enjoying
0.78
quished
0.77
Activations Density 0.031%