INDEX
Explanations
phrases centered around the concept of discussing or talking about various topics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.34
1.7%
938
+0.11
0.5%
866
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
938
+0.34
0.05
1573
+0.11
0.04
866
+0.10
0.03
Negative Logits
<bos>
-2.67
<?
-0.74
/**
-0.71
/***
-0.68
ⓧ
-0.68
///**
-0.66
-0.61
font
-0.59
/*++
-0.57
<?
-0.57
POSITIVE LOGITS
bandung
1.43
sovere
1.36
Minang
1.35
lidl
1.32
Juf
1.32
autunno
1.31
eiffel
1.30
affor
1.28
Czechos
1.27
milano
1.27
Activations Density 0.161%