INDEX
Explanations
dates in the format of a day and a year, such as "19th July 1960"
references to the 19th century
New Auto-Interp
Negative Logits
aminer
-0.85
orc
-0.83
anguage
-0.75
endum
-0.74
ensional
-0.74
orem
-0.72
arling
-0.72
ovie
-0.72
heed
-0.69
etitive
-0.69
POSITIVE LOGITS
th
1.00
âĸĪâĸĪ
0.93
61
0.81
09
0.81
08
0.81
07
0.80
05
0.80
06
0.79
059
0.79
03
0.78
Activations Density 0.027%