INDEX
    Explanations

    acts of physical affection

    New Auto-Interp
    Negative Logits
    orio
    -0.10
     flirt
    -0.10
    oley
    -0.09
    ï¼ıï¼ı
    -0.09
     sovereign
    -0.09
     cruel
    -0.09
    uisine
    -0.09
    ialis
    -0.09
     Toy
    -0.08
    arias
    -0.08
    POSITIVE LOGITS
     hug
    0.16
    æĭ¥
    0.16
    æĬ±
    0.15
     pet
    0.15
     attention
    0.14
     rub
    0.12
     touch
    0.12
     pat
    0.12
     lav
    0.12
     lap
    0.12
    Act Density 0.068%

    No Known Activations