INDEX
Explanations
connections between siblings and their relationships
New Auto-Interp
Negative Logits
bacteria
-0.23
biology
-0.22
β
-0.21
Biology
-0.21
(beta
-0.19
β
-0.19
batteries
-0.18
Bishop
-0.18
beta
-0.18
broadcasters
-0.18
POSITIVE LOGITS
bum
0.19
worst
0.17
unbind
0.17
worse
0.17
Bit
0.17
Worst
0.16
aison
0.16
.unbind
0.16
unb
0.16
sisters
0.15
Activations Density 0.137%