INDEX
Explanations
familial names and relationships
New Auto-Interp
Negative Logits
18
-0.26
13
-0.26
19
-0.25
11
-0.25
12
-0.24
14
-0.24
15
-0.23
16
-0.21
17
-0.20
20
-0.20
POSITIVE LOGITS
seven
0.31
7
0.29
six
0.28
seventh
0.26
6
0.25
seven
0.23
sixth
0.23
ä¸ĥ
0.23
ä¸ĥ
0.22
five
0.22
Activations Density 0.070%