INDEX
Explanations
terms related to fairness or equity
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.16
0.9%
1896
+0.13
0.7%
341
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1557
+0.16
0.03
1837
+0.13
0.03
341
+0.11
0.03
Negative Logits
<bos>
-3.00
-0.93
ⓧ
-0.85
/*++
-0.69
/**
-0.69
lateinit
-0.68
<?
-0.68
/*
-0.66
interact
-0.64
develop
-0.63
POSITIVE LOGITS
bandung
1.51
maroc
1.48
Minang
1.47
casio
1.36
quoc
1.35
cæ
1.35
nuoc
1.34
napoli
1.34
brava
1.34
tramont
1.34
Activations Density 0.078%