INDEX
    Explanations

    expressions of love and affection

    New Auto-Interp
    Negative Logits
    uman
    -0.16
    ootball
    -0.15
    ization
    -0.14
    xing
    -0.14
    ols
    -0.14
    andon
    -0.14
    emiz
    -0.14
    ks
    -0.14
    ophile
    -0.13
    sexual
    -0.13
    POSITIVE LOGITS
    ably
    0.19
    eat
    0.18
    joy
    0.17
    fully
    0.17
    kind
    0.17
    edException
    0.17
     affair
    0.16
    _errno
    0.15
    full
    0.15
    leigh
    0.14
    Act Density 0.080%

    No Known Activations