INDEX
Explanations
mentions of beliefs related to religious or scientific views
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1842
+0.17
0.5%
872
+0.14
0.4%
1253
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1915
+0.17
0.05
1380
+0.14
0.02
295
+0.13
0.02
Negative Logits
kram
-0.87
solidar
-0.74
zyn
-0.74
uhr
-0.73
durs
-0.73
hina
-0.72
ohr
-0.67
lemp
-0.67
mme
-0.67
kano
-0.66
POSITIVE LOGITS
trône
0.61
Existen
0.54
Février
0.54
berço
0.52
oiseau
0.52
noël
0.51
Tienen
0.51
ziua
0.50
rocher
0.49
curé
0.48
Activations Density 0.572%