INDEX
Explanations
dates related to specific historical events
references to specific years, particularly during World War II
New Auto-Interp
Negative Logits
atives
-0.78
ative
-0.77
hed
-0.73
ator
-0.72
ional
-0.71
att
-0.70
essee
-0.70
keys
-0.70
amin
-0.70
inqu
-0.70
POSITIVE LOGITS
1942
1.16
1943
1.13
1944
1.10
1941
1.05
1939
0.99
1938
0.95
1937
0.94
1914
0.90
1945
0.90
1936
0.90
Activations Density 0.016%