INDEX
Explanations
references to historical figures and their titles
New Auto-Interp
Negative Logits
skou
-0.17
mium
-0.16
serter
-0.15
roma
-0.14
iveau
-0.14
"(\<
-0.14
.bridge
-0.14
rompt
-0.14
pta
-0.13
eki
-0.13
POSITIVE LOGITS
Count
0.43
Earl
0.40
count
0.40
Lord
0.38
ear
0.37
Counts
0.36
Ear
0.35
counts
0.35
Duke
0.34
Count
0.33
Activations Density 0.075%