INDEX
Explanations
references to specific names or titles starting with "De."
New Auto-Interp
Negative Logits
fir
-0.17
ro
-0.17
rote
-0.16
rist
-0.16
f
-0.15
ra
-0.15
oub
-0.15
xa
-0.15
res
-0.15
wo
-0.15
POSITIVE LOGITS
acon
0.19
žel
0.19
facto
0.18
oxy
0.17
construct
0.17
eds
0.17
constructed
0.16
initely
0.16
deal
0.16
anship
0.16
Activations Density 0.051%