INDEX
Explanations
terms related to morbidity, mortality, and social justice issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
216
+0.12
0.7%
186
+0.10
0.6%
125
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
38
+0.12
0.01
450
+0.10
0.01
186
+0.10
0.01
Negative Logits
googleapis
-2.22
enem
-1.76
phones
-1.66
ubunt
-1.59
↵
-1.55
-1.55
↵
-1.55
-1.55
-1.55
↵
-1.55
POSITIVE LOGITS
nier
1.59
BER
1.57
isan
1.53
liest
1.53
varies
1.52
ensch
1.46
outwe
1.46
frame
1.43
rs
1.42
ality
1.42
Activations Density 0.103%