INDEX
Explanations
web links, especially those starting with "www" or "https."
the presence of URLs or web links
New Auto-Interp
Negative Logits
caucuses
-0.82
deportation
-0.82
judgment
-0.76
grounding
-0.75
propositions
-0.74
Archdemon
-0.74
judgments
-0.73
Cabrera
-0.73
antagonist
-0.72
Maher
-0.71
POSITIVE LOGITS
youtube
1.86
1.84
amazon
1.63
1.62
daily
1.41
1.39
twitch
1.38
Downloadha
1.37
etsy
1.35
example
1.32
Activations Density 0.033%