INDEX
    Explanations

    expressions of love and affection

    New Auto-Interp
    Negative Logits
     Conſ
    -0.64
    ء
    -0.62
     himſelf
    -0.61
     Nuovo
    -0.60
    Unnamed
    -0.59
     syscall
    -0.59
     Dons
    -0.59
     Majefty
    -0.58
     ſame
    -0.57
     themſelves
    -0.57
    POSITIVE LOGITS
     love
    0.84
    ValueStyle
    0.82
     dislike
    0.78
     hate
    0.77
     loves
    0.77
     loved
    0.74
     senang
    0.74
     liked
    0.74
     hated
    0.72
     hates
    0.72
    Act Density 0.103%

    No Known Activations