INDEX
    Explanations

    words expressing personal preferences or enjoyment

    expressions of liking or loving something

    New Auto-Interp
    Negative Logits
    DragonMagazine
    -0.83
    orthy
    -0.83
    arat
    -0.80
    onse
    -0.77
    AIDS
    -0.77
    arf
    -0.77
    Purchase
    -0.77
    til
    -0.75
    ainer
    -0.74
    ene
    -0.73
    POSITIVE LOGITS
     seeing
    1.03
     experimenting
    0.97
     surprises
    0.96
     watching
    0.96
     interacting
    0.87
     having
    0.86
     talking
    0.84
     hearing
    0.84
     listening
    0.83
     simplicity
    0.82
    Act Density 0.095%

    No Known Activations