INDEX
Explanations
words related to chronological time markers, specifically years and months
references to time durations and ages
New Auto-Interp
Negative Logits
hip
-0.83
Scrib
-0.75
ufficient
-0.71
terday
-0.68
hov
-0.67
Drawn
-0.66
ourcing
-0.66
aukee
-0.65
ides
-0.65
Haram
-0.65
POSITIVE LOGITS
olds
1.15
lengths
0.76
platinum
0.72
old
0.67
iversary
0.66
lockdown
0.64
increments
0.64
experien
0.62
beast
0.61
hz
0.61
Activations Density 0.077%