INDEX
Explanations
a pattern related to subscription or signing up for something
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1839
+0.11
0.3%
629
+0.10
0.3%
152
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
629
+0.11
0.02
374
+0.10
0.02
1466
+0.08
0.02
Negative Logits
Iš
-0.69
Į
-0.64
-0.62
robus
-0.61
bēr
-0.61
habet
-0.61
Katso
-0.60
ļ
-0.60
šķ
-0.59
potest
-0.57
POSITIVE LOGITS
disreg
0.59
chré
0.59
pathfinder
0.58
thank
0.56
<bos>
0.56
wrangler
0.55
lakers
0.55
bytu
0.55
DONOR
0.55
cryst
0.54
Activations Density 0.051%