INDEX
Explanations
dates in the format of year followed by a non-zero activation value
specific years and notable historical dates
New Auto-Interp
Negative Logits
htaking
-0.70
TEXT
-0.70
EStreamFrame
-0.65
ancest
-0.65
abilia
-0.61
poster
-0.61
kef
-0.58
inhabit
-0.58
imate
-0.57
Num
-0.55
POSITIVE LOGITS
onwards
0.95
onward
0.91
thereafter
0.82
theless
0.74
later
0.72
after
0.69
fml
0.67
iday
0.66
mornings
0.66
»Ĵ
0.65
Activations Density 0.237%