INDEX
Explanations
numerals with space in between them
questions and reflections on past events
New Auto-Interp
Negative Logits
conduc
-0.76
maximizing
-0.73
sacrific
-0.69
uties
-0.68
yielding
-0.65
subdu
-0.65
maximize
-0.64
advant
-0.64
icient
-0.63
endish
-0.63
POSITIVE LOGITS
terday
1.42
happened
1.03
yesterday
1.03
Yesterday
0.98
cember
0.97
hadn
0.95
forgot
0.94
ometime
0.93
ago
0.93
hindsight
0.91
Activations Density 0.592%