INDEX
Explanations
occurrences of the word "were."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
361
+0.14
0.8%
345
+0.13
0.7%
481
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
304
+0.14
0.09
32
+0.13
0.07
351
+0.12
0.07
Negative Logits
ĥ½
-2.00
Caption
-1.75
²
-1.73
gs
-1.64
¹
-1.63
hers
-1.53
labels
-1.50
¿½
-1.46
wear
-1.41
¥
-1.38
POSITIVE LOGITS
eer
1.60
cht
1.58
afen
1.54
ophe
1.53
isco
1.53
aul
1.52
FPar
1.48
riton
1.40
olin
1.40
opan
1.39
Activations Density 0.108%