INDEX
Explanations
numerical information such as statistics, coordinates, instructions, and code snippets
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
32
+0.14
0.5%
204
+0.14
0.5%
506
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
506
+0.14
0.03
204
+0.14
0.03
1045
+0.13
0.03
Negative Logits
inconce
-1.10
indestru
-1.03
disagre
-1.00
intrigu
-1.00
Mahomet
-0.97
unspeak
-0.96
Mlle
-0.95
reluct
-0.95
Gorb
-0.94
apprehen
-0.93
POSITIVE LOGITS
(':0.74
(":0.69
('/:0.67
:::
0.64
✨:
0.64
giuri
0.63
("/:0.63
():
0.60
=:
0.60
}:
0.60
Activations Density 0.174%