INDEX
Explanations
names of people
names of people and notable figures
New Auto-Interp
Negative Logits
targ
-0.81
Munich
-0.77
redit
-0.73
Dian
-0.67
meg
-0.66
phys
-0.65
synd
-0.65
hots
-0.65
Ds
-0.64
PK
-0.64
POSITIVE LOGITS
Vas
1.85
Emily
1.71
Emily
1.49
Coffin
1.07
Vance
1.04
Coff
1.00
Rochester
0.98
Gupta
0.94
Caleb
0.94
Claire
0.93
Activations Density 0.043%