INDEX
Explanations
references to social connections and relationships
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.29
1.1%
227
+0.12
0.5%
513
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1252
+0.29
0.05
1235
+0.12
0.04
1786
+0.08
0.03
Negative Logits
<bos>
-2.02
/***
-0.64
neutralize
-0.52
enshr
-0.49
ⓧ
-0.49
knelt
-0.48
minimise
-0.48
Agua
-0.48
mobilize
-0.48
modulate
-0.47
POSITIVE LOGITS
jetta
1.02
riva
1.02
Minang
0.99
Græ
0.99
lele
0.97
sentra
0.91
Palembang
0.90
Meksi
0.89
brune
0.89
croce
0.89
Activations Density 0.691%