INDEX
Explanations
years, particularly in historical contexts
references to specific years and historical events
New Auto-Interp
Negative Logits
ularity
-0.91
ocks
-0.71
egal
-0.70
iant
-0.69
anu
-0.69
gart
-0.68
umen
-0.66
amin
-0.66
ute
-0.66
stud
-0.65
POSITIVE LOGITS
1916
0.90
abwe
0.76
Osw
0.75
1915
0.74
1914
0.74
1917
0.73
Sakuya
0.71
1918
0.71
Ez
0.68
1909
0.68
Activations Density 0.035%