INDEX
Explanations
references to specific individuals and their birth years
New Auto-Interp
Negative Logits
wards
-0.17
etim
-0.15
opleft
-0.15
feit
-0.14
ersist
-0.14
ìĸ´ê°Ģ
-0.14
elope
-0.14
erea
-0.14
edor
-0.14
анÑĮ
-0.14
POSITIVE LOGITS
iane
0.17
softmax
0.15
ongan
0.15
/bind
0.15
Gregg
0.14
.foundation
0.14
bil
0.14
Alman
0.14
ansa
0.14
sÃŃ
0.14
Activations Density 0.010%