INDEX
    Explanations

    references to TV shows and popular culture

    New Auto-Interp
    Negative Logits
    -1.10
    
    
    -0.98
    /**
    -0.87
    <?
    -0.82
    <?
    
    -0.69
    <bos>
    -0.64
    /*
    -0.63
    #
    -0.60
     jakarta
    -0.58
    /***
    
    -0.57
    POSITIVE LOGITS
     Bagdad
    1.09
     Juf
    1.06
     Khart
    0.99
     Amerik
    0.98
     Nguy
    0.94
     thuy
    0.94
     lele
    0.94
     Keny
    0.94
     Karang
    0.93
     panik
    0.93
    Act Density 0.604%

    No Known Activations