INDEX
Explanations
references to news reports or coverage
instances of the word "report" and its variations in the text
New Auto-Interp
Negative Logits
ĸļ
-0.83
ophers
-0.73
antics
-0.73
uit
-0.70
conservancy
-0.69
etsk
-0.68
rongh
-0.67
uits
-0.66
palate
-0.65
ophone
-0.64
POSITIVE LOGITS
è¦ļéĨĴ
0.89
quoting
0.84
reports
0.83
age
0.83
sources
0.80
quotes
0.80
WASHINGTON
0.77
rumors
0.76
leaked
0.76
quoted
0.75
Activations Density 0.055%