INDEX
    Explanations
    New Auto-Interp
    Negative Logits
       	
    -0.08
    pan
    -0.07
    phant
    -0.07
    insula
    -0.07
     iid
    -0.07
     ürün
    -0.07
    Product
    -0.07
    _product
    -0.07
     allow
    -0.07
    nl
    -0.07
    POSITIVE LOGITS
    回复
    0.09
     soci
    0.08
     回复
    0.08
     pretending
    0.08
     reply
    0.08
    回应
    0.08
    返信
    0.07
     છો
    0.07
     objections
    0.07
     packet
    0.07
    Act Density 0.003%

    No Known Activations