INDEX
    Explanations

    expressions of love and affection

    New Auto-Interp
    Negative Logits
    umb
    -0.16
    iaz
    -0.15
    sexual
    -0.15
    виж
    -0.15
    ivism
    -0.15
    stroy
    -0.15
    sus
    -0.15
    una
    -0.15
    que
    -0.15
    idal
    -0.14
    POSITIVE LOGITS
     affair
    0.21
    joy
    0.20
    /lo
    0.19
    ably
    0.18
    lessly
    0.18
    ingly
    0.17
     Hate
    0.17
    /h
    0.16
    eat
    0.16
    able
    0.16
    Act Density 0.080%

    No Known Activations