INDEX
Explanations
terms related to accountability and oversight in governance
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
215
+0.15
0.8%
288
+0.13
0.7%
92
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
437
+0.15
0.04
67
+0.13
0.04
349
+0.13
0.04
Negative Logits
»¿
-1.91
º
-1.69
Ŀ
-1.57
linking
-1.57
precise
-1.53
ians
-1.51
accurate
-1.49
women
-1.48
details
-1.45
apolis
-1.41
POSITIVE LOGITS
blockList
1.67
psin
1.51
amycin
1.46
imab
1.44
imes
1.36
ERA
1.36
ower
1.35
chus
1.33
("\1.31
warz
1.31
Activations Density 0.033%