INDEX
Explanations
phrases indicating associations or connections between different entities or concepts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
283
+0.15
0.9%
148
+0.12
0.7%
377
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
283
+0.15
0.02
148
+0.12
0.01
463
+0.12
0.01
Negative Logits
llo
-1.59
chat
-1.55
uvre
-1.44
genstein
-1.41
Environment
-1.37
blast
-1.35
tsd
-1.34
instructed
-1.32
enabled
-1.30
HS
-1.29
POSITIVE LOGITS
groups
1.80
with
1.79
fields
1.78
domains
1.76
ities
1.75
subgroups
1.70
categories
1.64
pathways
1.64
gaps
1.64
regions
1.63
Activations Density 0.075%