INDEX
Explanations
words expressing personal preferences or enjoyment
expressions of liking or loving something
New Auto-Interp
Negative Logits
DragonMagazine
-0.83
orthy
-0.83
arat
-0.80
onse
-0.77
AIDS
-0.77
arf
-0.77
Purchase
-0.77
til
-0.75
ainer
-0.74
ene
-0.73
POSITIVE LOGITS
seeing
1.03
experimenting
0.97
surprises
0.96
watching
0.96
interacting
0.87
having
0.86
talking
0.84
hearing
0.84
listening
0.83
simplicity
0.82
Activations Density 0.095%