INDEX
Explanations
specific formatting patterns in news articles, such as author names followed by specific symbols
indicative phrases or components related to citations or references in articles
New Auto-Interp
Negative Logits
charm
-0.67
endeav
-0.66
Quart
-0.65
charms
-0.60
etta
-0.59
twe
-0.57
professionally
-0.57
bomb
-0.57
rox
-0.57
lc
-0.56
POSITIVE LOGITS
Related
0.80
Related
0.76
eve
0.72
EVA
0.71
Recent
0.70
Washington
0.68
igham
0.67
SPONSORED
0.66
BBC
0.66
Âł Âł
0.66
Activations Density 0.173%