INDEX
Explanations
dates written in a specific format: month, year, and parentheses around the day with varying activations for different occurrences
parentheses and their associated numerical values
New Auto-Interp
Negative Logits
centr
-0.81
compromised
-0.71
pastoral
-0.66
Pax
-0.63
Punk
-0.62
pitched
-0.62
Maced
-0.61
bipolar
-0.61
perspect
-0.61
polarized
-0.60
POSITIVE LOGITS
24
1.02
28
0.99
13
0.98
16
0.98
33
0.97
15
0.96
14
0.95
12
0.94
39
0.94
17
0.94
Activations Density 0.107%