INDEX
Explanations
news-related entities or events mentioned in reports
phrases related to reporting and attribution of information
New Auto-Interp
Negative Logits
istries
-0.88
rontal
-0.87
aden
-0.76
etheless
-0.74
uga
-0.72
Cola
-0.72
teness
-0.69
joice
-0.67
cking
-0.67
incial
-0.66
POSITIVE LOGITS
by
0.92
herein
0.84
above
0.83
evidenced
0.80
previously
0.78
below
0.77
exempl
0.70
BY
0.69
Kurd
0.69
happens
0.67
Activations Density 0.063%