INDEX
Explanations
specific sequences of words and their context within a narrative
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.33
1.9%
156
+0.13
0.8%
1
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
23
+0.33
0.11
488
+0.13
0.06
56
+0.12
-0.01
Negative Logits
jsfiddle
-2.08
watson
-2.04
à±ģ
-1.89
àµį
-1.75
dimen
-1.71
à±į
-1.65
pad
-1.63
à±
-1.54
ées
-1.54
coin
-1.52
POSITIVE LOGITS
revealed
1.92
unsuccessful
1.72
disastrous
1.67
later
1.66
revealing
1.66
subsequent
1.65
prompted
1.62
children
1.61
soon
1.60
fate
1.60
Activations Density 1.144%