INDEX
Explanations
historical events and notable dates
New Auto-Interp
Negative Logits
WWII
-0.23
192
-0.21
194
-0.21
191
-0.21
193
-0.19
Hitler
-0.19
Roosevelt
-0.19
WW
-0.18
196
-0.18
190
-0.17
POSITIVE LOGITS
184
0.60
183
0.59
185
0.59
182
0.51
186
0.47
Victorian
0.42
181
0.40
187
0.35
nineteenth
0.28
Dickens
0.28
Activations Density 0.375%