INDEX
Explanations
words related to personal qualities and characteristics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.16
0.5%
1385
+0.14
0.5%
1741
+0.14
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
131
+0.16
0.05
332
+0.14
0.06
478
+0.14
0.06
Negative Logits
depic
-1.92
reluct
-1.85
maneu
-1.75
disagre
-1.75
encomp
-1.73
gaily
-1.72
inev
-1.72
erad
-1.71
excru
-1.71
accla
-1.71
POSITIVE LOGITS
ability
0.84
own
0.82
abilities
0.73
life
0.68
overall
0.67
capabilities
0.67
entire
0.64
future
0.61
biggest
0.61
relationship
0.61
Activations Density 0.365%