INDEX
Explanations
time-related words, such as days, weeks, and months
New Auto-Interp
Negative Logits
inav
-0.82
urities
-0.77
versely
-0.71
reme
-0.68
ifice
-0.67
jri
-0.66
untled
-0.65
Unc
-0.63
icator
-0.63
olini
-0.62
POSITIVE LOGITS
ago
1.17
'
1.07
apiece
0.98
hops
0.91
gestation
0.90
pring
0.89
Ago
0.88
consecut
0.86
'/
0.84
hift
0.83
Activations Density 0.121%