INDEX
Explanations
questions starting with "what do" followed by certain phrases or terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
703
+0.14
0.6%
1978
+0.13
0.6%
50
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1634
+0.14
0.07
703
+0.13
0.07
1978
+0.12
0.07
Negative Logits
<bos>
-2.74
/**
-0.72
ⓧ
-0.70
-0.68
inaugurate
-0.60
AssemblyCompany
-0.57
bardziej
-0.56
/*
-0.56
endow
-0.55
ajudá
-0.55
POSITIVE LOGITS
Minang
1.01
bandung
0.96
lele
0.93
majest
0.90
loto
0.90
karton
0.89
utop
0.87
ohr
0.86
gubern
0.84
laci
0.84
Activations Density 0.507%