INDEX
Explanations
expressions about making decisions in the best interest of a community or group
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
488
+0.09
0.2%
674
+0.07
0.2%
599
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1543
+0.09
0.04
488
+0.07
0.04
392
+0.07
0.02
Negative Logits
disagre
-1.40
inev
-1.38
reluct
-1.37
indestru
-1.37
increa
-1.36
impractica
-1.35
impra
-1.35
Juf
-1.34
maneu
-1.34
fta
-1.33
POSITIVE LOGITS
best
0.67
<bos>
0.63
principalColumn
0.61
best
0.58
AssemblyTitle
0.57
welfare
0.56
EditorBrowsable
0.56
UnitTesting
0.55
MathML
0.54
interests
0.54
Activations Density 0.302%