INDEX
Explanations
mentions of the name "Sane" or variations thereof
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
596
+0.17
1.0%
50
+0.16
1.0%
1562
+0.15
1.0%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1562
+0.17
0.02
596
+0.16
0.02
1624
+0.15
0.02
Negative Logits
<bos>
-2.06
<?
-0.73
-0.71
ⓧ
-0.66
maging
-0.64
/*
-0.63
/**
-0.63
liel
-0.61
būs
-0.60
bawat
-0.59
POSITIVE LOGITS
Sa
1.49
Sa
1.38
sa
1.11
maer
1.05
Sá
1.04
sa
1.00
aen
0.98
glau
0.97
Sae
0.97
Saar
0.96
Activations Density 0.069%