INDEX
Explanations
names of historical figures and their affiliations
New Auto-Interp
Negative Logits
ts
-0.26
ta
-0.24
tn
-0.24
to
-0.24
tes
-0.23
techn
-0.23
tem
-0.23
tek
-0.23
td
-0.23
tools
-0.23
POSITIVE LOGITS
ki
0.29
ky
0.26
dorf
0.26
ký
0.26
cheid
0.26
chaft
0.25
hire
0.24
piration
0.24
s
0.23
chrift
0.23
Activations Density 0.124%