INDEX
    Explanations

    expressions of happiness or positivity

    New Auto-Interp
    Negative Logits
    -0.51
    }),
    
    -0.48
    }}}{
    -0.45
    '];
    
    -0.43
    ]]
    
    -0.42
     }}\
    -0.42
    '>
    
    -0.41
     Administrativna
    -0.41
    '));
    
    -0.41
    ];
    
    -0.40
    POSITIVE LOGITS
     :)
    1.66
     ;)
    1.27
     :).
    1.23
     :-)
    1.16
     :(
    1.15
     :)</
    1.12
     :))
    1.10
    !:)
    1.08
    :)
    1.06
     :)))
    1.05
    Act Density 0.010%

    No Known Activations