INDEX
Explanations
the specific time reference "week" within text
occurrences of the phrase "this week" or references to recent events in a temporal context
New Auto-Interp
Negative Logits
gery
-0.72
itialized
-0.63
ggle
-0.63
erate
-0.61
UES
-0.60
vae
-0.59
vec
-0.59
perpetual
-0.59
ut
-0.59
Control
-0.58
POSITIVE LOGITS
afternoon
0.81
evening
0.75
marked
0.73
flower
0.72
subpoen
0.72
unveiling
0.71
announcing
0.71
ç¥ŀ
0.70
lished
0.68
女
0.67
Activations Density 0.107%