INDEX
Explanations
terms related to educational content and public inquiries
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
0.7%
1842
+0.15
0.5%
1577
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1870
+0.19
0.07
1013
+0.15
0.10
1435
+0.12
0.06
Negative Logits
<bos>
-1.74
<?
-1.06
-1.00
ⓧ
-0.92
/***
-0.85
<?
-0.83
/**
-0.79
<!--
-0.75
/*!
-0.74
Dá
-0.74
POSITIVE LOGITS
unce
1.55
squa
1.54
inext
1.50
secon
1.43
sovere
1.40
unwarran
1.35
Shakspeare
1.35
unlaw
1.30
paradiso
1.30
oleo
1.29
Activations Density 1.684%