INDEX
Explanations
references to notable individuals and their relationships
New Auto-Interp
Negative Logits
enco
-0.15
ermen
-0.14
Sherman
-0.14
imos
-0.14
Ted
-0.14
Burke
-0.14
otify
-0.14
lt
-0.14
Rim
-0.14
iron
-0.13
POSITIVE LOGITS
Prince
0.28
Prince
0.28
prince
0.24
Purple
0.23
Purple
0.23
princ
0.22
Pais
0.21
Minneapolis
0.20
purple
0.18
princes
0.18
Activations Density 0.005%