INDEX
Explanations
phrases that denote relationships between individuals or entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.27
1.2%
1343
+0.15
0.7%
227
+0.13
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.27
0.05
1930
+0.15
0.04
227
+0.13
0.05
Negative Logits
<bos>
-3.09
expand
-0.70
to
-0.70
we
-0.70
introduce
-0.69
they
-0.68
hold
-0.67
let
-0.67
bring
-0.66
have
-0.66
POSITIVE LOGITS
véhic
1.65
Juf
1.61
quoique
1.53
Keny
1.53
mikrofon
1.51
soulign
1.50
affor
1.49
Kategor
1.49
silikon
1.49
maksi
1.49
Activations Density 0.145%