INDEX
    Explanations

    phrases related to accusations or claims of wrongdoing

    New Auto-Interp
    Negative Logits
    <bos>
    -2.94
    -0.67
    /***
    
    -0.63
    //<
    -0.58
    displayquote
    -0.58
    HasKey
    -0.58
    tw
    -0.58
    //{
    
    -0.57
    win
    -0.57
          
    -0.57
    POSITIVE LOGITS
     Juf
    1.76
     Khart
    1.67
     Minang
    1.66
     thut
    1.60
     fta
    1.50
     bandung
    1.50
     jaya
    1.48
     accla
    1.48
     aen
    1.48
     increa
    1.46
    Act Density 0.045%

    No Known Activations