INDEX
Explanations
occurrences of the word "another"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
239
+0.13
0.8%
362
+0.13
0.7%
359
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
239
+0.13
0.05
359
+0.13
0.05
199
+0.12
0.04
Negative Logits
ards
-1.60
arde
-1.57
ende
-1.51
ors
-1.51
esta
-1.49
roc
-1.41
apine
-1.41
Rapids
-1.36
prom
-1.33
balloons
-1.32
POSITIVE LOGITS
than
1.69
than
1.61
world
1.58
Than
1.58
liking
1.57
leans
1.57
possible
1.55
gree
1.54
hundred
1.51
uras
1.49
Activations Density 0.516%