INDEX
Explanations
dates written in the format "March 31" with a higher activation for "March 31" specifically
references to the number 31
New Auto-Interp
Negative Logits
atis
-0.86
atics
-0.85
abwe
-0.85
akens
-0.78
atic
-0.78
akespe
-0.76
fare
-0.71
yrinth
-0.71
anned
-0.70
awaru
-0.70
POSITIVE LOGITS
st
0.93
¯¯
0.75
rd
0.72
¯¯¯¯
0.70
00
0.69
803
0.67
bow
0.65
msec
0.64
ESS
0.64
backer
0.63
Activations Density 0.041%