INDEX
    Explanations

    Positive emotion

    New Auto-Interp
    Negative Logits
    σίας
    -0.07
    spr
    -0.07
     MUST
    -0.06
     handy
    -0.06
    reject
    -0.06
    -0.06
    mites
    -0.06
    ака
    -0.06
     Hund
    -0.06
    toJson
    -0.06
    POSITIVE LOGITS
     grateful
    0.07
     understood
    0.06
     collider
    0.06
     cowork
    0.06
     intellig
    0.06
     Gig
    0.06
     catalogs
    0.06
    ограф
    0.06
    enta
    0.06
    etik
    0.06
    Act Density 0.030%

    No Known Activations