INDEX
Explanations
references to TV shows and popular culture
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.15
0.6%
689
+0.11
0.4%
1183
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
138
+0.15
0.09
1493
+0.11
0.08
2010
+0.09
0.08
Negative Logits
ⓧ
-1.10
-0.98
/**
-0.87
<?
-0.82
<?
-0.69
<bos>
-0.64
/*
-0.63
#
-0.60
jakarta
-0.58
/***
-0.57
POSITIVE LOGITS
Bagdad
1.09
Juf
1.06
Khart
0.99
Amerik
0.98
Nguy
0.94
thuy
0.94
lele
0.94
Keny
0.94
Karang
0.93
panik
0.93
Activations Density 0.604%