INDEX
Explanations
sentences related to family dynamics and communication issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.11
0.3%
599
+0.10
0.3%
1201
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.11
0.06
736
+0.10
0.08
683
+0.08
0.07
Negative Logits
fta
-1.43
Abbé
-1.41
wien
-1.41
stockholm
-1.40
secon
-1.40
Simult
-1.37
emphat
-1.37
squa
-1.36
increa
-1.34
accla
-1.33
POSITIVE LOGITS
mostly
0.81
modest
0.81
nothing
0.76
decent
0.73
minor
0.73
average
0.71
basic
0.70
fairly
0.69
mainly
0.69
maybe
0.69
Activations Density 1.182%