INDEX
Explanations
instances of conjunctions and transition phrases
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.14
0.4%
1042
+0.11
0.3%
1445
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1892
+0.14
0.05
1806
+0.11
0.05
62
+0.09
0.04
Negative Logits
Kategor
-0.84
lele
-0.83
teras
-0.81
antik
-0.79
kram
-0.79
gend
-0.77
meras
-0.76
panik
-0.75
kac
-0.75
optik
-0.75
POSITIVE LOGITS
deserves
0.81
prolly
0.80
certainly
0.70
reflects
0.67
probably
0.67
therefor
0.67
contributes
0.66
shouldn
0.66
shouldnt
0.65
reminds
0.65
Activations Density 0.371%