INDEX
Explanations
references to family members or interpersonal relationships with a focus on the maternal figure
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1793
+0.11
0.4%
596
+0.11
0.4%
506
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1793
+0.11
0.03
1515
+0.11
0.02
1525
+0.11
0.03
Negative Logits
raste
-0.53
HStack
-0.50
vább
-0.49
pinos
-0.47
valla
-0.46
***/
-0.46
pincode
-0.46
strane
-0.45
ordinaria
-0.45
vantaggi
-0.45
POSITIVE LOGITS
mother
1.20
Mother
1.10
Mother
1.09
mother
1.07
mothers
1.04
MOTHER
1.02
Mothers
0.96
mom
0.93
Mothers
0.93
MOTHER
0.91
Activations Density 0.066%