INDEX
Explanations
dates, particularly formatted as month/day and year
New Auto-Interp
Negative Logits
30
-0.53
Thirty
-0.40
thirty
-0.39
Û³Û°
-0.36
Thirty
-0.36
31
-0.36
-0.25
304
-0.24
311
-0.23
030
-0.22
POSITIVE LOGITS
Valentine
0.22
28
0.19
26
0.19
25
0.19
27
0.18
ruary
0.18
22
0.17
24
0.17
23
0.17
281
0.17
Activations Density 0.023%