INDEX
Explanations
phrases expressing confusion or lack of understanding
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.21
0.8%
1601
+0.10
0.4%
397
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1601
+0.21
0.05
1372
+0.10
0.05
581
+0.09
0.04
Negative Logits
<bos>
-2.14
jakarta
-0.51
///**
-0.47
AppCompatTheme
-0.47
ensure
-0.45
pessi
-0.44
enri
-0.44
lineto
-0.42
hydrate
-0.41
inject
-0.41
POSITIVE LOGITS
cajones
0.72
postolic
0.72
conclud
0.66
explanation
0.65
capulco
0.65
soggior
0.64
puzzling
0.63
sappi
0.62
hecta
0.62
fathoms
0.62
Activations Density 0.315%