INDEX
Explanations
dollar amounts or percentages mentioned in the context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.19
0.6%
453
+0.13
0.4%
776
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1978
+0.19
0.05
776
+0.13
0.05
815
+0.09
0.04
Negative Logits
vœ
-0.93
suscep
-0.91
indestru
-0.90
intersper
-0.89
beaute
-0.88
outlander
-0.88
élégante
-0.85
hairc
-0.85
reconno
-0.84
vagu
-0.84
POSITIVE LOGITS
kafe
0.85
krim
0.83
bakar
0.82
karton
0.78
keramik
0.77
karbon
0.76
ekos
0.76
kristal
0.76
klinik
0.76
bunda
0.74
Activations Density 0.165%