INDEX
Explanations
names and relationships
New Auto-Interp
Negative Logits
Stuart
-0.17
McKay
-0.16
Roths
-0.15
zÅij
-0.14
Byron
-0.14
decre
-0.14
éϵ
-0.14
Twe
-0.14
gaard
-0.14
ROY
-0.13
POSITIVE LOGITS
Thomas
0.24
Humph
0.20
Hum
0.20
Roger
0.19
Barth
0.19
Ralph
0.19
Nicholas
0.18
Thomas
0.18
John
0.18
Sym
0.17
Activations Density 0.015%