INDEX
Explanations
phrases related to people
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
1.8%
25
+0.12
0.9%
168
+0.12
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
25
+0.24
0.09
168
+0.12
0.09
1691
+0.12
0.07
Negative Logits
<bos>
-3.76
ⓧ
-1.06
<?
-0.93
/***
-0.92
/**
-0.86
-0.81
/*
-0.80
modernize
-0.76
<?
-0.75
harmonize
-0.73
POSITIVE LOGITS
unlaw
1.07
riva
0.94
Keny
0.94
pleins
0.94
quoique
0.91
seksi
0.91
véhic
0.90
unwarran
0.90
Muhamma
0.90
marea
0.89
Activations Density 0.175%