INDEX
Explanations
dates and locations mentioned in news articles
date and location references specific to news events
New Auto-Interp
Negative Logits
nesia
-0.61
wo
-0.59
Hearts
-0.58
Romans
-0.58
typo
-0.57
tune
-0.57
treaties
-0.56
poke
-0.56
Italians
-0.56
unanswered
-0.51
POSITIVE LOGITS
ember
0.71
arity
0.71
tnc
0.69
REUTERS
0.69
Aug
0.68
interstitial
0.68
window
0.66
thumbnails
0.65
Feb
0.63
EMBER
0.62
Activations Density 0.078%