INDEX
Explanations
dates in the month/day/year format
specific dates and references related to time
New Auto-Interp
Negative Logits
Zhu
-0.68
odi
-0.68
pron
-0.67
np
-0.62
rons
-0.62
glers
-0.62
omatic
-0.61
nanop
-0.60
filament
-0.60
Penny
-0.60
POSITIVE LOGITS
15
1.57
14
1.57
15
1.55
14
1.55
16
1.54
16
1.54
17
1.42
17
1.38
13
1.28
13
1.24
Activations Density 0.183%