INDEX
Explanations
phrases related to public statements or declarations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
964
+0.10
0.3%
394
+0.08
0.2%
1705
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
838
+0.10
0.05
284
+0.08
0.05
709
+0.08
0.02
Negative Logits
bianchi
-0.63
sieur
-0.61
rossi
-0.61
koc
-0.60
maroc
-0.59
bronz
-0.58
babi
-0.58
ananas
-0.58
lele
-0.57
lampe
-0.56
POSITIVE LOGITS
Mə
0.69
Mitä
0.64
autorytatywna
0.63
Manbalar
0.62
Și
0.60
Să
0.59
webElementXpaths
0.59
pueden
0.58
disambiguazione
0.58
Răsp
0.58
Activations Density 0.573%