INDEX
Explanations
phrases related to technical instructions and capabilities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
690
+0.11
0.3%
5
+0.11
0.3%
2034
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
5
+0.11
0.05
968
+0.11
0.04
1406
+0.08
0.04
Negative Logits
!!</
-0.65
)':
-0.59
incess
-0.58
Viene
-0.58
Kategor
-0.58
pymongo
-0.57
.。
-0.57
:");
-0.57
rimb
-0.56
Langue
-0.56
POSITIVE LOGITS
we
0.61
you
0.54
able
0.50
jurassic
0.49
they
0.49
putstr
0.48
can
0.47
voila
0.47
можно
0.47
تفصیلات
0.47
Activations Density 0.342%