INDEX
Explanations
years written in the format "19XX" or historical events
references to specific years in historical contexts
New Auto-Interp
Negative Logits
iants
-0.80
pora
-0.76
iant
-0.74
aminer
-0.73
arious
-0.72
orem
-0.72
omorphic
-0.69
iance
-0.67
senal
-0.67
ctica
-0.67
POSITIVE LOGITS
61
0.97
âĸĪâĸĪ
0.92
62
0.88
04
0.88
06
0.87
05
0.87
08
0.86
07
0.84
49
0.83
03
0.83
Activations Density 0.014%