INDEX
    Explanations

    expressions of happiness and positive emotions

    New Auto-Interp
    Negative Logits
    aload
    -0.15
    ROP
    -0.14
    opi
    -0.14
    evin
    -0.14
    hti
    -0.14
    anik
    -0.14
    eg
    -0.13
    oss
    -0.13
    oping
    -0.13
    ain
    -0.13
    POSITIVE LOGITS
     about
    0.21
     to
    0.18
    overall
    0.17
     overall
    0.17
    kul
    0.16
    ä¹İ
    0.15
    ritel
    0.15
    Ñĥв
    0.15
    About
    0.15
    irty
    0.15
    Act Density 0.043%

    No Known Activations