INDEX
Explanations
phrases related to interviews and conversations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.14
0.4%
1013
+0.10
0.3%
906
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1150
+0.14
0.04
1800
+0.10
0.06
1823
+0.09
0.04
Negative Logits
kasa
-1.11
lele
-1.09
hina
-1.04
toscana
-1.02
susun
-1.01
umo
-1.01
sement
-0.99
kug
-0.97
milano
-0.96
bandung
-0.95
POSITIVE LOGITS
<bos>
1.14
recalled
0.75
recalls
0.70
recall
0.66
recalling
0.63
recounted
0.62
recollection
0.61
recount
0.60
describe
0.59
explained
0.59
Activations Density 0.341%