INDEX
Explanations
dates in the 1920s, 1930s, and 1940s
specific historical years or dates relevant to events
New Auto-Interp
Negative Logits
por
-0.69
ror
-0.64
ioned
-0.64
opher
-0.62
lez
-0.62
task
-0.60
erate
-0.60
twe
-0.58
spin
-0.58
webcam
-0.58
POSITIVE LOGITS
1936
0.85
1935
0.85
1939
0.85
1938
0.84
1934
0.82
1933
0.81
Churchill
0.80
-'
0.79
1937
0.78
1914
0.77
Activations Density 0.051%