INDEX
Explanations
content related to copyright, licensing, and permissions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1252
+0.09
0.2%
1897
+0.08
0.2%
185
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
185
+0.09
0.03
958
+0.08
0.03
2001
+0.08
0.03
Negative Logits
grumbled
-0.53
Witkin
-0.51
bosky
-0.51
shuddered
-0.50
murmured
-0.50
shivered
-0.50
McLaugh
-0.49
mumbled
-0.49
izvē
-0.49
escrever
-0.49
POSITIVE LOGITS
drap
0.66
tass
0.65
stoff
0.65
taz
0.63
molle
0.62
reser
0.62
canne
0.61
tred
0.61
piaci
0.60
copie
0.60
Activations Density 0.232%