INDEX
Explanations
references to twins, particularly identical twins
New Auto-Interp
Negative Logits
granddaughter
-0.17
Ñİн
-0.15
cÃŃ
-0.15
غاÙĦ
-0.15
ãĥĪãĥ«
-0.15
èn
-0.15
grandson
-0.14
ittle
-0.14
utsch
-0.14
ãĥ³ãĥĶ
-0.14
POSITIVE LOGITS
twins
0.76
twin
0.73
Twins
0.66
Twin
0.63
identical
0.49
brothers
0.47
frat
0.42
sisters
0.40
Brothers
0.38
siblings
0.35
Activations Density 0.102%