INDEX
Explanations
advertisements in a document
indicators of advertisement or promotional content
New Auto-Interp
Negative Logits
ĪĴ
-0.76
opian
-0.72
rency
-0.72
ħĭ
-0.67
acea
-0.66
ashes
-0.63
fruitful
-0.63
uality
-0.62
amiya
-0.62
rite
-0.62
POSITIVE LOGITS
][/
0.76
WATCHED
0.74
sidx
0.68
eh
0.66
Jac
0.66
]"
0.66
advertisement
0.64
inately
0.63
Abrams
0.62
IMAGES
0.61
Activations Density 0.071%