INDEX
    Explanations

    expressions of personal preferences and feelings

    followed by positive sentiment

    expressing strong positive feelings

    New Auto-Interp
    Negative Logits
    клопе
    -0.52
    <!--[
    -0.48
     مشين
    -0.47
    LEP
    -0.45
     Aware
    -0.44
    alay
    -0.44
     esperienze
    -0.44
    IPT
    -0.42
    Aware
    -0.42
     GAZ
    -0.42
    POSITIVE LOGITS
     liked
    1.95
     love
    1.86
     loved
    1.84
     liking
    1.78
     loves
    1.74
     LOVE
    1.59
     LOVED
    1.57
     likes
    1.53
    loved
    1.50
    liked
    1.49
    Act Density 0.369%

    No Known Activations