INDEX
Explanations
dates written using the format of beginning of the numbers followed by a space and the ending of the numbers
specific numbers or dates, particularly those related to historical events
New Auto-Interp
Negative Logits
ical
-0.80
orter
-0.78
iques
-0.77
ickers
-0.75
itional
-0.74
ete
-0.72
ovie
-0.71
iguous
-0.71
ional
-0.71
ically
-0.70
POSITIVE LOGITS
650
1.05
mph
0.87
91
0.87
th
0.85
58
0.84
92
0.83
85
0.83
71
0.82
94
0.82
87
0.82
Activations Density 0.037%