INDEX
Explanations
references to important or significant concepts or entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
994
+0.17
0.6%
871
+0.14
0.5%
1573
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
994
+0.17
0.04
1573
+0.14
0.04
871
+0.12
0.04
Negative Logits
tolerably
-0.72
withal
-0.63
endeavouring
-0.59
effectually
-0.58
apprehen
-0.57
impelled
-0.57
fays
-0.55
Condem
-0.54
indestru
-0.54
encomp
-0.53
POSITIVE LOGITS
great
1.19
Great
1.12
Great
1.12
great
1.08
GREAT
1.04
GREAT
0.98
grande
0.59
grans
0.58
greatest
0.57
grande
0.57
Activations Density 0.082%