INDEX
Explanations
terms related to ethical standards and classifications
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.29
1.2%
1870
+0.11
0.5%
599
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
394
+0.29
0.08
599
+0.11
0.07
651
+0.11
0.04
Negative Logits
<bos>
-2.70
ⓧ
-1.01
/**
-0.97
<?
-0.96
-0.90
/***
-0.78
/*
-0.74
Chham
-0.68
<?
-0.63
/*!
-0.62
POSITIVE LOGITS
bandung
1.23
milano
1.23
paradiso
1.15
maroc
1.14
seksi
1.11
italia
1.10
lele
1.09
jaya
1.08
ananas
1.08
jawa
1.06
Activations Density 0.736%