INDEX
Explanations
concepts related to restrictions or constraints, particularly those that evoke a sense of intensity or pressure
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
82
+0.13
0.7%
99
+0.10
0.6%
172
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
82
+0.13
0.03
99
+0.10
0.02
338
+0.10
0.02
Negative Logits
safer
-1.58
brighter
-1.56
ylum
-1.52
asma
-1.51
mur
-1.48
wealth
-1.42
osp
-1.40
clearer
-1.39
possible
-1.39
murdered
-1.37
POSITIVE LOGITS
ĻĤ
3.30
·
2.93
į
2.88
ģ
2.80
ľĵ
2.64
Ł
2.43
ī
2.41
ĥ½
2.40
£
2.40
ħ
2.34
Activations Density 0.145%