INDEX
Explanations
mentions of specific entities or organizations mentioned in the text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.13
0.4%
1978
+0.11
0.3%
1741
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1984
+0.13
0.06
1654
+0.11
0.05
86
+0.10
0.02
Negative Logits
ayaa
-0.78
umo
-0.75
saad
-0.69
naer
-0.68
karna
-0.67
cabrio
-0.67
kark
-0.66
saha
-0.65
hej
-0.65
pank
-0.65
POSITIVE LOGITS
itself
0.51
Skocz
0.51
'
0.50
Enregistrer
0.48
<bos>
0.48
’
0.48
Wtf
0.46
uksessa
0.43
$'
0.43
Xoxo
0.43
Activations Density 0.330%