INDEX
Explanations
phrases related to being recognized or well-known
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.10
0.3%
198
+0.09
0.3%
964
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1468
+0.10
0.03
1265
+0.09
0.03
508
+0.09
0.04
Negative Logits
Eft
-0.75
Eccle
-0.71
coar
-0.70
leonardo
-0.67
«<
-0.66
<^
-0.66
jsonString
-0.65
thut
-0.65
Intere
-0.65
edp
-0.64
POSITIVE LOGITS
unfamiliar
0.72
familiar
0.69
familiarity
0.65
familiar
0.59
obscure
0.57
acquainted
0.56
familiarize
0.54
know
0.54
know
0.52
wikipedia
0.49
Activations Density 0.403%