INDEX
Explanations
phrases related to a specific action or task that needs to be carried out
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.13
0.4%
1328
+0.12
0.4%
1334
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.13
0.04
1334
+0.12
0.04
1328
+0.11
0.04
Negative Logits
fta
-1.70
thut
-1.67
Intere
-1.59
guarante
-1.56
depic
-1.56
increa
-1.55
desir
-1.54
intersper
-1.53
fays
-1.52
Augu
-1.52
POSITIVE LOGITS
be
0.92
provide
0.82
help
0.79
promote
0.79
enhance
0.78
assist
0.78
serve
0.78
facilitate
0.77
withstand
0.76
protect
0.75
Activations Density 0.157%