INDEX
Explanations
numerical references related to historical dates, particularly the late 19th century
New Auto-Interp
Negative Logits
erate
-0.18
adius
-0.17
ваÑĢ
-0.16
ough
-0.15
arme
-0.14
ering
-0.14
ulas
-0.14
oring
-0.14
uro
-0.14
eral
-0.14
POSITIVE LOGITS
unks
0.17
ãĥ¥
0.14
astos
0.14
nable
0.14
heav
0.14
лекÑģ
0.14
_MC
0.14
리카
0.14
DP
0.14
-cols
0.14
Activations Density 0.012%