INDEX
Explanations
time-related words such as "year", "month", and "week"
references to time durations, specifically months and years
New Auto-Interp
Negative Logits
emort
-0.75
inav
-0.74
ashtra
-0.74
casts
-0.73
hett
-0.70
uctions
-0.68
jri
-0.67
corridors
-0.66
ãĤ¢ãĥ«
-0.66
ktop
-0.65
POSITIVE LOGITS
ago
1.14
Ago
1.09
long
0.88
dozen
0.83
night
0.78
iversary
0.74
glass
0.72
apiece
0.72
nd
0.70
arro
0.68
Activations Density 0.101%