INDEX
Explanations
words related to assigning responsibility or fault
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
896
+0.14
0.5%
260
+0.11
0.3%
130
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
896
+0.14
0.02
901
+0.11
0.03
260
+0.10
0.02
Negative Logits
ekos
-0.63
antik
-0.63
protokol
-0.63
makro
-0.62
silikon
-0.61
elek
-0.61
bambu
-0.60
mily
-0.60
konsult
-0.59
karton
-0.58
POSITIVE LOGITS
Blame
1.27
blame
1.21
blame
1.16
Blame
1.13
blamed
1.09
blames
1.08
blaming
1.08
hairc
0.87
faulting
0.84
swarovski
0.80
Activations Density 0.091%