INDEX
Explanations
dates mentioned in text
days of the week and specific dates
New Auto-Interp
Negative Logits
agos
-0.86
allo
-0.86
abet
-0.84
ordes
-0.82
rahim
-0.82
ris
-0.79
omething
-0.78
igated
-0.78
abe
-0.78
uchin
-0.77
POSITIVE LOGITS
mornings
1.23
nights
1.22
Night
1.20
morning
1.20
afternoon
1.18
night
1.17
evenings
1.12
evening
1.11
Nights
0.95
Night
0.92
Activations Density 0.088%