INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    (Temp
    -0.07
     Drinking
    -0.06
     plea
    -0.06
    -0.06
    _threshold
    -0.06
     wanna
    -0.06
    负责任
    -0.06
    AGAIN
    -0.06
    得了
    -0.06
     Leading
    -0.06
    POSITIVE LOGITS
    //================================================================
    0.07
    0.07
     microbi
    0.07
    atten
    0.07
    保养
    0.07
    Calculate
    0.07
    c
    0.07
     expansive
    0.07
    当地政府
    0.07
    dated
    0.07
    Act Density 0.004%

    No Known Activations