INDEX
Explanations
instances of the word "recap"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.18
1.0%
2
+0.13
0.7%
148
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
69
+0.18
0.03
151
+0.13
0.03
109
+0.12
0.02
Negative Logits
ĭ
-2.24
ı
-1.87
nez
-1.82
ī
-1.70
§
-1.69
ĥ½
-1.62
ESULT
-1.61
ances
-1.60
į
-1.59
¼
-1.57
POSITIVE LOGITS
it
2.10
ital
1.77
itet
1.55
atory
1.55
ulous
1.54
ository
1.48
itor
1.47
itals
1.46
athi
1.45
apart
1.44
Activations Density 1.648%