INDEX
Explanations
mentions of group membership and affiliation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
421
+0.11
0.3%
938
+0.10
0.3%
122
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
938
+0.11
0.04
1516
+0.10
0.03
1861
+0.10
0.04
Negative Logits
grati
-0.62
carina
-0.58
Allister
-0.57
nmax
-0.55
artney
-0.54
brava
-0.52
haviour
-0.51
omnium
-0.51
Cormack
-0.51
lorenzo
-0.51
POSITIVE LOGITS
member
1.21
Member
1.13
member
1.09
members
1.07
Members
1.06
Member
1.03
members
1.00
Members
0.99
MEMBER
0.97
membership
0.93
Activations Density 0.043%