INDEX
Explanations
dates in the format "Month day" with high activation values
dates, specifically in March
New Auto-Interp
Negative Logits
lder
-0.68
kB
-0.67
tray
-0.66
itaire
-0.66
tox
-0.63
gone
-0.62
constitu
-0.61
quot
-0.59
unnecess
-0.59
folios
-0.59
POSITIVE LOGITS
Madness
1.20
nard
0.91
riage
0.91
2019
0.88
rd
0.84
ing
0.84
2015
0.82
onna
0.81
Hare
0.80
yard
0.79
Activations Density 0.023%