INDEX
Explanations
words related to various news topics or announcements
repeated references to news or news-related content
New Auto-Interp
Negative Logits
randomized
-0.70
calibration
-0.67
marked
-0.67
mapping
-0.66
magnetic
-0.66
quart
-0.65
*
-0.64
determining
-0.64
psychologist
-0.64
compensated
-0.63
POSITIVE LOGITS
ews
5.22
ew
1.72
EW
1.22
ools
1.09
ewski
1.02
ovie
0.95
aughs
0.93
edia
0.88
ufact
0.88
ornings
0.86
Activations Density 0.010%