INDEX
Explanations
links or relationships between different concepts, themes, or ideas
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1135
+0.11
0.3%
1107
+0.10
0.3%
872
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1135
+0.11
0.04
284
+0.10
0.05
642
+0.10
0.04
Negative Logits
igno
-1.07
embra
-1.06
robus
-1.06
dises
-1.04
revan
-1.03
contex
-1.02
curi
-1.00
Simult
-0.99
oner
-0.99
emphat
-0.98
POSITIVE LOGITS
between
1.45
between
1.26
Between
1.18
Between
1.14
между
1.13
zwischen
1.10
BETWEEN
1.06
BETWEEN
1.05
tussen
1.03
giữa
1.02
Activations Density 0.213%