INDEX
Explanations
specific mentions of dates and organizations, possibly related to news or events
the word "the" and its repeated occurrences in textual contexts
New Auto-Interp
Negative Logits
indal
-0.83
agan
-0.72
util
-0.71
angers
-0.69
oper
-0.68
onto
-0.65
leeve
-0.64
itching
-0.64
etheless
-0.64
obe
-0.63
POSITIVE LOGITS
bombshell
0.72
Hilton
0.68
reopened
0.68
celebrated
0.66
renewed
0.65
unthinkable
0.64
":""},{"0.64
headlined
0.63
amended
0.63
Ͻ
0.62
Activations Density 0.324%