INDEX
Explanations
concerns related to health risks and medical interventions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.10
0.3%
1809
+0.09
0.2%
1919
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.10
0.06
1415
+0.09
0.02
1809
+0.07
0.03
Negative Logits
optik
-0.87
augus
-0.85
galeri
-0.85
alpes
-0.85
Fasc
-0.85
monaster
-0.85
lele
-0.83
keramik
-0.82
antik
-0.82
poliester
-0.81
POSITIVE LOGITS
Pozdrawiam
0.66
einerseits
0.59
sice
0.58
lapsingToolbar
0.58
nominally
0.56
neither
0.55
pozdrawiam
0.55
atât
0.54
arată
0.54
își
0.54
Activations Density 0.438%