INDEX
Explanations
descriptions of personal experiences and reflections
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1962
+0.10
0.3%
297
+0.09
0.3%
1589
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1962
+0.10
0.06
929
+0.09
0.04
1010
+0.08
0.05
Negative Logits
balon
-0.78
rú
-0.73
parlamento
-0.73
utop
-0.71
meras
-0.70
kön
-0.70
pavo
-0.70
torba
-0.68
StructEnd
-0.67
karton
-0.66
POSITIVE LOGITS
maintained
0.48
alternating
0.47
kept
0.46
montr
0.45
lüğü
0.45
Keeps
0.44
jusqu
0.43
moteur
0.43
held
0.43
keep
0.43
Activations Density 0.324%