INDEX
Explanations
specific years, particularly those associated with historical events
New Auto-Interp
Negative Logits
eer
-0.16
te
-0.16
et
-0.15
Dob
-0.14
enting
-0.14
543
-0.14
yi
-0.14
Obr
-0.14
Harmon
-0.14
armor
-0.14
POSITIVE LOGITS
chaft
0.21
ship
0.21
oren
0.16
merce
0.15
aras
0.15
sel
0.15
eros
0.15
lv
0.15
ERY
0.15
lug
0.15
Activations Density 0.013%