INDEX
Explanations
phrases related to demonstrating, showing, or highlighting something to others
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.19
1.1%
1482
+0.14
0.8%
966
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
966
+0.19
0.06
869
+0.14
0.07
1482
+0.13
0.06
Negative Logits
<bos>
-3.13
-0.91
ⓧ
-0.91
<?
-0.80
/***
-0.72
/**
-0.71
<?
-0.70
//{
-0.63
updateUI
-0.62
/*
-0.61
POSITIVE LOGITS
Shows
1.13
Showing
1.12
Shows
1.10
SHOWS
1.09
thut
1.08
bandung
1.05
showing
1.05
SHOW
1.04
shows
1.03
embodi
1.02
Activations Density 0.244%