INDEX
Explanations
URLs and website-related terms
numeric values or references to URLs
New Auto-Interp
Negative Logits
frontline
-0.71
silenced
-0.67
spitting
-0.65
encomp
-0.65
etheless
-0.64
orchestr
-0.64
marching
-0.64
axe
-0.63
triggered
-0.62
union
-0.61
POSITIVE LOGITS
Advertisement
1.55
RAW
1.24
advertisement
1.23
Newsletter
1.17
Advertisements
1.17
Another
1.12
Article
1.12
Spons
1.11
About
1.11
More
1.10
Activations Density 0.786%