INDEX
Explanations
keywords related to causality and consequentialism
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1013
+0.14
0.4%
394
+0.09
0.3%
2015
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1273
+0.14
0.03
16
+0.09
0.07
394
+0.09
0.04
Negative Logits
rafra
-0.96
Souha
-0.94
Mâ
-0.81
renfer
-0.81
Præ
-0.81
Autre
-0.80
Græ
-0.79
Châ
-0.79
Godt
-0.78
Câ
-0.78
POSITIVE LOGITS
submitting
0.73
preparing
0.71
buying
0.69
việc
0.69
picking
0.68
collecting
0.68
designing
0.68
putting
0.68
creating
0.68
obtaining
0.67
Activations Density 0.629%