INDEX
Explanations
names of people or characters with a possessive form
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.32
2.1%
1178
+0.10
0.6%
204
+0.09
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1485
+0.32
0.10
1178
+0.10
0.10
204
+0.09
0.10
Negative Logits
<bos>
-3.83
/**
-0.99
/*
-0.93
<?
-0.92
/***
-0.89
ⓧ
-0.87
-0.85
Vegeu
-0.79
/*++
-0.78
AssemblyCompany
-0.76
POSITIVE LOGITS
affor
1.93
increa
1.92
unlaw
1.90
Juf
1.85
reluct
1.84
impra
1.82
wherea
1.79
volunte
1.78
disagre
1.75
maneu
1.75
Activations Density 0.424%