INDEX
Explanations
references to specific years and events related to culture and society
New Auto-Interp
Negative Logits
utherford
-0.15
obvious
-0.15
ÏĥÏĦα
-0.14
prot
-0.14
agh
-0.14
oldem
-0.14
nde
-0.13
iegel
-0.13
æĽ
-0.13
ooky
-0.13
POSITIVE LOGITS
tent
0.16
ittle
0.16
kili
0.15
gb
0.15
asma
0.15
jar
0.14
cre
0.14
Lama
0.13
ipay
0.13
iar
0.13
Activations Density 0.071%