INDEX
    Explanations

    instances of the word "love" and its variations in various contexts

    New Auto-Interp
    Negative Logits
    es
    -0.23
    ez
    -0.21
    ej
    -0.20
    ek
    -0.19
    eo
    -0.18
    esin
    -0.18
    esine
    -0.17
    eses
    -0.17
    ele
    -0.16
    eki
    -0.16
    POSITIVE LOGITS
    ewise
    0.23
    emaker
    0.21
    etime
    0.21
    ethe
    0.21
    ings
    0.20
    eman
    0.19
    ETIME
    0.19
    INGS
    0.19
    eworthy
    0.18
    ewis
    0.18
    Act Density 0.110%

    No Known Activations