INDEX
    Explanations

    expressions of love and affection

    New Auto-Interp
    Negative Logits
    TagHelper
    -0.59
     aandacht
    -0.59
     Efq
    -0.57
    xhttp
    -0.56
     cdti
    -0.55
    jooq
    -0.54
     sexe
    -0.53
     themſelves
    -0.51
    ſel
    -0.51
     poupée
    -0.51
    POSITIVE LOGITS
     loves
    1.31
     loved
    1.11
     LOVED
    1.05
    loved
    0.97
     Loves
    0.97
    Loved
    0.96
     hated
    0.96
     likes
    0.96
    loves
    0.95
     liked
    0.94
    Act Density 0.112%

    No Known Activations