INDEX
Explanations
examples of events or situations that lead to blame being assigned
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.15
0.4%
1013
+0.12
0.4%
260
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
284
+0.15
0.07
2044
+0.12
0.07
2030
+0.10
0.03
Negative Logits
?...
-0.95
fatis
-0.94
!...
-0.91
nece
-0.86
fuf
-0.85
perfon
-0.85
dichi
-0.82
Luglio
-0.82
Dijo
-0.82
specialmente
-0.81
POSITIVE LOGITS
;
0.77
.
0.76
.;
0.68
because
0.66
but
0.65
!
0.61
;
0.61
here
0.61
;
0.59
for
0.59
Activations Density 0.587%