INDEX
Explanations
lengthy descriptions and stories with a professional or formal tone
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
266
+0.08
0.3%
1233
+0.07
0.3%
1612
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1612
+0.08
0.03
266
+0.07
0.02
1233
+0.07
0.02
Negative Logits
<bos>
-0.86
public
-0.65
//
-0.63
<?
-0.63
regulate
-0.62
/***
-0.62
-0.61
can
-0.61
also
-0.59
port
-0.59
POSITIVE LOGITS
meaningful
2.30
meaningfully
1.74
fatis
1.72
maroc
1.62
tramont
1.55
mosso
1.54
ibiza
1.53
stockholm
1.51
imposs
1.51
casio
1.50
Activations Density 0.119%