INDEX
Explanations
links, addresses, or specific details within a text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.7%
1413
+0.10
0.4%
506
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1413
+0.17
0.05
559
+0.10
0.05
82
+0.08
0.04
Negative Logits
<bos>
-2.55
SequentialGroup
-0.69
ⓧ
-0.65
principalColumn
-0.63
+#+
-0.62
Waray
-0.59
intios
-0.59
forChild
-0.56
sexu
-0.56
ുറ
-0.56
POSITIVE LOGITS
reluct
1.10
accla
1.08
maneu
1.08
philanth
1.05
milf
1.00
affor
0.98
fortn
0.96
shenan
0.95
disreg
0.95
erad
0.94
Activations Density 0.311%