INDEX
Explanations
proper nouns and names related to characters, organizations, and events in a story or text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.14
0.9%
131
+0.14
0.9%
553
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
553
+0.14
0.05
131
+0.14
0.05
1492
+0.13
0.04
Negative Logits
<bos>
-3.12
public
-0.78
GTCX
-0.74
;
-0.70
displayquote
-0.69
AssemblyCompany
-0.69
</tbody>
-0.69
mergeFrom
-0.69
///
-0.68
nav
-0.68
POSITIVE LOGITS
Minang
2.08
increa
2.06
maneu
2.04
affor
2.02
Juf
1.98
emphat
1.91
peppa
1.88
bandung
1.87
guarante
1.86
thut
1.85
Activations Density 0.143%