INDEX
    Explanations

    expressions of affection and love

    New Auto-Interp
    Negative Logits
     mín
    -0.61
     Whitfield
    -0.58
     muligt
    -0.56
     Perugia
    -0.56
    unlikely
    -0.55
     dolci
    -0.55
     occurred
    -0.54
     auraient
    -0.54
     secours
    -0.53
     restantes
    -0.53
    POSITIVE LOGITS
     loves
    1.17
     loved
    1.13
     love
    1.08
    loves
    1.05
    loved
    1.04
     Loves
    1.03
     liked
    1.00
     hates
    0.97
    Loves
    0.96
     likes
    0.92
    Act Density 0.069%

    No Known Activations