INDEX
Explanations
mentions of the day "Friday" in various contexts
New Auto-Interp
Negative Logits
ership
-0.97
hip
-0.82
aeda
-0.72
vironment
-0.72
arist
-0.69
ator
-0.69
atche
-0.69
ATOR
-0.68
estern
-0.67
itton
-0.67
POSITIVE LOGITS
afternoon
1.52
morning
1.49
night
1.38
evening
1.34
mornings
1.30
nights
1.15
Night
1.12
evenings
1.05
night
0.98
morning
0.96
Activations Density 0.023%