INDEX
Explanations
phrases related to searching, investigating, and locating specific entities or objects
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
908
+0.09
0.3%
468
+0.09
0.2%
1110
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
946
+0.09
0.04
908
+0.09
0.02
1250
+0.07
0.01
Negative Logits
thut
-1.46
effe
-1.44
fte
-1.37
embra
-1.37
fta
-1.35
aen
-1.34
ftu
-1.31
nece
-1.28
fep
-1.28
fto
-1.26
POSITIVE LOGITS
find
0.90
find
0.89
elusive
0.86
search
0.85
found
0.83
searching
0.81
searched
0.81
locate
0.81
search
0.80
finds
0.80
Activations Density 0.496%