INDEX
Explanations
descriptions of research studies and data analysis related to various topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.15
0.4%
1253
+0.13
0.4%
1177
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1499
+0.15
0.04
198
+0.13
0.04
1379
+0.08
0.03
Negative Logits
PerformLayout
-0.57
weile
-0.54
ѝ
-0.52
ECONDS
-0.52
webtoken
-0.52
‼
-0.51
fél
-0.50
ubscribe
-0.50
codile
-0.49
człowie
-0.49
POSITIVE LOGITS
increa
1.65
depic
1.65
reluct
1.64
maneu
1.61
encomp
1.58
guarante
1.58
disagre
1.57
apprehen
1.55
attemp
1.54
inev
1.50
Activations Density 0.274%