INDEX
Explanations
phrases related to statistical or numerical information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1473
+0.12
0.4%
468
+0.09
0.3%
1905
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1473
+0.12
0.04
1830
+0.09
0.04
1380
+0.08
0.02
Negative Logits
restera
-0.74
prendra
-0.73
viciss
-0.71
Souha
-0.67
Czechos
-0.66
compréhen
-0.66
philanth
-0.66
devint
-0.63
encomp
-0.63
Vaata
-0.62
POSITIVE LOGITS
estekak
0.65
препратки
0.64
,
0.56
/**
0.54
Havolalar
0.54
tanleria
0.53
<bos>
0.51
luia
0.51
lèvres
0.51
tetrachloride
0.50
Activations Density 0.209%