INDEX
Explanations
mentions of the word "tomorrow."
New Auto-Interp
Negative Logits
out
-0.17
uts
-0.16
sess
-0.16
"default
-0.15
aware
-0.14
pery
-0.14
adal
-0.14
ioned
-0.14
etable
-0.14
nh
-0.14
POSITIVE LOGITS
morning
0.39
afternoon
0.30
mor
0.25
night
0.24
Morning
0.24
evening
0.24
land
0.24
Morning
0.23
же
0.21
mornings
0.20
Activations Density 0.012%