INDEX
Explanations
days of the week
mentions of days of the week
New Auto-Interp
Negative Logits
lessly
-0.76
abwe
-0.76
popular
-0.69
Mellon
-0.69
schild
-0.68
hyde
-0.67
ustainable
-0.66
Annotations
-0.66
ively
-0.65
Nicarag
-0.64
POSITIVE LOGITS
urnal
0.83
zzle
0.82
uler
0.81
eta
0.80
Fri
0.77
etheus
0.77
emate
0.77
afternoon
0.76
ignt
0.74
morning
0.74
Activations Density 0.011%