INDEX
    Explanations

    human expressions and reactions

    New Auto-Interp
    Negative Logits
     sürekli
    0.38
    0.38
    тров
    0.37
     screaming
    0.36
     continually
    0.34
    ለያ
    0.34
     continuously
    0.34
    解决了
    0.34
    /*/
    0.33
     अक्टूबर
    0.33
    POSITIVE LOGITS
     smiled
    1.01
     nodded
    0.97
     chuckled
    0.95
     replied
    0.92
     laughed
    0.91
     chuckle
    0.91
     smile
    0.90
     shrug
    0.85
     reply
    0.85
     nodding
    0.85
    Act Density 0.027%

    No Known Activations