INDEX
Explanations
dates in the format of years
mentions of specific years, particularly 1938, 1939, and 1940
New Auto-Interp
Negative Logits
isters
-0.84
por
-0.81
orns
-0.78
keys
-0.77
ops
-0.76
olar
-0.75
olves
-0.74
yx
-0.73
atom
-0.72
andering
-0.71
POSITIVE LOGITS
1942
1.18
1941
1.13
1939
1.13
1943
1.12
1936
1.09
1937
1.07
1938
1.04
1944
1.01
1940
0.97
1935
0.97
Activations Density 0.010%