INDEX
Explanations
phrases emphasizing the role or importance of something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.40
1.9%
752
+0.17
0.8%
1967
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.40
0.07
1034
+0.17
0.04
16
+0.10
0.05
Negative Logits
<bos>
-2.98
ⓧ
-0.75
/**
-0.66
<?
-0.63
public
-0.61
/***
-0.61
SequentialGroup
-0.61
,
-0.60
立
-0.60
aren
-0.60
POSITIVE LOGITS
milano
1.52
considér
1.46
santiago
1.45
eiffel
1.44
bandung
1.41
napoli
1.40
Juf
1.39
véhic
1.35
écout
1.34
hcm
1.34
Activations Density 0.314%