INDEX
Explanations
cases or scenarios where a problem arises
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.10
0.3%
674
+0.10
0.3%
1108
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
814
+0.10
0.02
1026
+0.10
0.02
1548
+0.09
0.02
Negative Logits
palio
-0.86
riva
-0.80
virtù
-0.77
archivio
-0.76
sopr
-0.76
quegli
-0.75
pietre
-0.74
Luglio
-0.73
autunno
-0.72
ridu
-0.72
POSITIVE LOGITS
unspeak
0.68
ineffec
0.66
unavoid
0.65
vainly
0.64
apprehen
0.63
rouse
0.63
gratify
0.61
crouching
0.59
prolly
0.58
case
0.57
Activations Density 0.133%