INDEX
Explanations
mentions of time durations, particularly years and months
New Auto-Interp
Negative Logits
ourg
-0.17
uur
-0.16
203
-0.14
ascade
-0.14
851
-0.13
(
-0.13
compared
-0.13
Ple
-0.13
ucht
-0.13
istan
-0.13
POSITIVE LOGITS
ÛĮاÙĨ
0.16
RAINT
0.15
spent
0.15
ago
0.15
spent
0.14
otec
0.14
edly
0.14
esine
0.14
quo
0.14
rych
0.14
Activations Density 0.108%