INDEX
Explanations
time-related terms like days, weeks, months, and years
references to recent events or time-specific occurrences
New Auto-Interp
Negative Logits
////////////////////////////////
-0.68
arta
-0.64
SPA
-0.60
itute
-0.59
ãĥĺ
-0.58
eers
-0.57
itutes
-0.57
heid
-0.56
maximum
-0.55
please
-0.54
POSITIVE LOGITS
when
1.20
when
1.11
unveiling
0.83
announcing
0.76
WHEN
0.75
regarding
0.74
after
0.70
morning
0.68
alleging
0.67
unveil
0.67
Activations Density 0.247%