INDEX
Explanations
mentions of advertising
references to advertising
New Auto-Interp
Negative Logits
hew
-0.73
zh
-0.70
riot
-0.69
ious
-0.67
Sequence
-0.66
orable
-0.66
Stevens
-0.66
rior
-0.66
ave
-0.64
aves
-0.64
POSITIVE LOGITS
advertising
3.61
Advertising
2.67
advertisements
2.58
ads
2.36
advertisement
2.31
marketing
2.13
advertisers
2.12
advert
1.96
commercials
1.96
advertis
1.96
Activations Density 0.005%