INDEX
Explanations
data-related terms and phrases
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
876
+0.13
0.4%
781
+0.12
0.3%
344
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
781
+0.13
0.03
678
+0.12
0.04
1119
+0.10
0.03
Negative Logits
Estaba
-0.72
Dijo
-0.68
barbarous
-0.65
McLaugh
-0.63
Había
-0.62
Tampoco
-0.62
pamph
-0.60
Aún
-0.60
Leurs
-0.59
pettico
-0.59
POSITIVE LOGITS
ananas
0.89
teras
0.87
blin
0.84
stik
0.84
abnorm
0.84
alpes
0.83
sement
0.83
vort
0.82
glan
0.81
palet
0.77
Activations Density 0.444%