INDEX
Explanations
phrases that describe the characteristics of a thing or concept
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.22
0.7%
198
+0.10
0.3%
1150
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1315
+0.22
0.04
872
+0.10
0.04
386
+0.10
0.02
Negative Logits
<bos>
-1.99
MessageOf
-0.73
<?
-0.55
+:+
-0.51
<!--
-0.50
InjectMocks
-0.50
ratify
-0.49
ⓧ
-0.49
ANDUM
-0.49
invokingState
-0.48
POSITIVE LOGITS
bandung
1.12
jawa
1.03
jaya
1.01
Minang
0.97
Banjar
0.97
maneu
0.90
Karang
0.90
cartier
0.89
Jambi
0.88
🤣🤣
0.85
Activations Density 0.172%