INDEX
    Explanations

    expressions of affection and emotional connections between characters

    New Auto-Interp
    Negative Logits
    ultipart
    -0.16
    upe
    -0.15
    abol
    -0.15
    unik
    -0.15
     Exposure
    -0.14
    ifter
    -0.14
    amer
    -0.14
    åī²
    -0.14
    @js
    -0.14
     bev
    -0.14
    POSITIVE LOGITS
     hug
    0.26
     arms
    0.25
     embrace
    0.24
     hugged
    0.23
     arm
    0.22
    /arm
    0.21
     embraced
    0.19
    æĬ±
    0.18
     embraces
    0.18
     hugs
    0.18
    Act Density 0.132%

    No Known Activations