INDEX
    Explanations

    references to advertisements and advertising practices

    New Auto-Interp
    Negative Logits
     thermique
    -0.62
     inocente
    -0.58
     calcetines
    -0.58
     pegatina
    -0.57
     Komunikasi
    -0.57
     Nacionales
    -0.57
     tatuajes
    -0.56
     Económica
    -0.56
     Nuestros
    -0.56
    เยอะ
    -0.55
    POSITIVE LOGITS
     Ads
    0.93
     ads
    0.87
     ADS
    0.84
    Cop
    0.81
     Cop
    0.79
     cop
    0.79
    Ads
    0.74
    ads
    0.73
     ray
    0.71
     ass
    0.69
    Act Density 0.272%

    No Known Activations