INDEX
Explanations
proper nouns related to characters or names in a narrative
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.7%
1741
+0.15
0.6%
1097
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1097
+0.17
0.08
1741
+0.15
0.01
736
+0.10
0.05
Negative Logits
<bos>
-1.72
<?
-0.99
ⓧ
-0.92
-0.92
/***
-0.85
katun
-0.82
habited
-0.81
€/
-0.81
manuten
-0.79
<!--
-0.79
POSITIVE LOGITS
disreg
0.93
shenan
0.89
ineffec
0.86
maneu
0.83
impra
0.82
stickied
0.81
starbucks
0.81
scottish
0.80
brooklyn
0.78
lmfao
0.77
Activations Density 0.307%