INDEX
    Explanations

    mentions of negative sentiments or controversy, such as displeasure and buzz surrounding a topic

    New Auto-Interp
    Negative Logits
    <bos>
    -2.72
    -0.77
    <?
    -0.76
    /***
    
    -0.75
    
    
    -0.73
    /*
    -0.64
    //{
    
    -0.63
    /**
    -0.58
    <?
    
    -0.57
     deliver
    -0.57
    POSITIVE LOGITS
     Khart
    1.41
     Juf
    1.34
     fortn
    1.27
     Keny
    1.26
     unce
    1.23
     secon
    1.22
     Muhamma
    1.19
     Minang
    1.18
     Sarm
    1.18
     inext
    1.17
    Act Density 1.786%

    No Known Activations