INDEX
Explanations
terms related to exchanges or mutual benefits between different entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
479
+0.10
0.3%
1129
+0.09
0.3%
331
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1846
+0.10
0.02
1290
+0.09
0.03
1668
+0.09
0.02
Negative Logits
emphat
-0.94
effe
-0.92
ftu
-0.90
suspic
-0.87
desir
-0.87
fup
-0.86
inconce
-0.85
peculi
-0.84
fays
-0.83
fatis
-0.83
POSITIVE LOGITS
<bos>
0.80
cser
0.54
desertcart
0.51
AndEndTag
0.50
Italijani
0.48
promise
0.47
ற்
0.45
ьаж
0.45
DeleteBehavior
0.45
reward
0.45
Activations Density 0.106%