INDEX
    Explanations

    phrases expressing personal preferences or enjoyment

    expressions of preference or liking

    New Auto-Interp
    Negative Logits
    arta
    -0.80
    tek
    -0.72
    irrel
    -0.71
    krit
    -0.70
    cession
    -0.69
    ilion
    -0.69
    inary
    -0.68
    Login
    -0.68
    INAL
    -0.64
    PATH
    -0.64
    POSITIVE LOGITS
     seeing
    1.00
     experimenting
    0.91
     watching
    0.91
     surprises
    0.88
     interacting
    0.87
     hearing
    0.87
     having
    0.86
     to
    0.86
     talking
    0.79
    ably
    0.78
    Act Density 0.075%

    No Known Activations