INDEX
    Explanations

    references to economic exploitation and slavery

    New Auto-Interp
    Negative Logits
    
    -0.76
     بيها
    -0.63
     numerus
    -0.62
    AddHtmlAttribute
    -0.60
    illées
    -0.59
    itieren
    -0.55
    hamilan
    -0.55
     typhoid
    -0.54
    miert
    -0.54
     saliva
    -0.54
    POSITIVE LOGITS
     surplus
    0.76
     leftover
    0.63
     discarded
    0.61
    queryInterface
    0.56
     repur
    0.55
     Surplus
    0.53
     rejected
    0.52
    掉的
    0.51
    Lef
    0.50
     unwanted
    0.50
    Act Density 0.285%

    No Known Activations