INDEX
Explanations
years mentioned in a specific format, such as "from XXXX" with a high activation value
specific years and durations related to historical events or time periods
New Auto-Interp
Negative Logits
ĺħ
-0.61
multip
-0.61
Trop
-0.55
rored
-0.52
needed
-0.50
ACTION
-0.50
cryst
-0.50
ATHER
-0.50
favor
-0.48
Nature
-0.48
POSITIVE LOGITS
onwards
1.61
onward
1.55
until
1.25
till
1.20
until
1.09
through
0.94
downwards
0.88
til
0.86
inception
0.84
til
0.83
Activations Density 0.082%