INDEX
Explanations
references to news outlets or news-related terms
instances of the word "news" and its variants
New Auto-Interp
Negative Logits
equival
-0.69
ãĤ©
-0.68
argon
-0.67
é¾
-0.66
verages
-0.63
DEBUG
-0.63
DoS
-0.63
IPM
-0.61
ashtra
-0.61
pmwiki
-0.58
POSITIVE LOGITS
leans
0.79
izons
0.74
angled
0.70
pires
0.70
pac
0.65
lisher
0.65
phrine
0.65
enment
0.64
ilings
0.64
enhagen
0.63
Activations Density 0.155%