INDEX
Explanations
phrases that convey complexity or depth in discussion
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.27
1.2%
1013
+0.10
0.5%
2015
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1871
+0.27
0.08
1380
+0.10
0.02
1839
+0.10
0.12
Negative Logits
<bos>
-1.42
SEDS
-0.90
bronz
-0.81
íí
-0.79
ാൻ
-0.75
CppCodeGen
-0.74
RegressionTest
-0.74
demografica
-0.74
prioritize
-0.73
intptr
-0.72
POSITIVE LOGITS
shenan
1.47
milf
1.46
wikihow
1.46
hentai
1.40
simpsons
1.28
genshin
1.27
felipe
1.25
:'(
1.25
lmfao
1.25
destinées
1.24
Activations Density 11.020%