INDEX
Explanations
phrases related to the role of various entities or concepts in different contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.18
0.6%
50
+0.15
0.5%
16
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.18
0.06
752
+0.15
0.05
50
+0.13
0.04
Negative Logits
raught
-0.53
<bos>
-0.53
createState
-0.52
InitStruct
-0.49
IGraphics
-0.48
">#
-0.47
couldn
-0.46
addContainerGap
-0.45
ždy
-0.44
)))));
-0.44
POSITIVE LOGITS
hcm
0.98
considér
0.92
ecru
0.92
swarovski
0.91
luxuriant
0.88
santiago
0.88
hairc
0.88
milano
0.85
ricardo
0.83
roberto
0.83
Activations Density 0.272%