INDEX
Explanations
instances of praise or accolades
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.9%
597
+0.13
0.7%
1535
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
597
+0.17
0.07
197
+0.13
0.06
47
+0.10
0.05
Negative Logits
<bos>
-3.40
ⓧ
-0.89
-0.75
<?
-0.71
},[])
-0.70
/**
-0.64
ensure
-0.61
stabilize
-0.60
/*++
-0.60
encourage
-0.60
POSITIVE LOGITS
dégust
1.14
outlander
1.13
soulign
1.11
véhic
1.09
!!</
1.09
maroc
1.04
renfer
1.04
gettyimages
1.04
catég
1.04
quoique
1.04
Activations Density 0.163%