INDEX
Explanations
phrases related to human rights issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.9%
1967
+0.10
0.5%
1438
+0.08
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1870
+0.18
0.08
1438
+0.10
0.10
1385
+0.08
0.17
Negative Logits
<bos>
-2.45
<?
-0.76
ویکیآمباردا
-0.76
/***
-0.75
Autoritní
-0.73
propOrder
-0.70
/**
-0.64
Билгалдахарш
-0.64
///**
-0.63
<!--[
-0.61
POSITIVE LOGITS
Minang
1.05
siyah
1.03
jawa
1.02
Karang
1.00
Jambi
0.98
Hitam
0.97
bandung
0.95
Banjar
0.94
jaya
0.92
yanto
0.89
Activations Density 3.110%