INDEX
Explanations
phrases containing the word "quit."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
260
+0.11
0.4%
1416
+0.10
0.3%
605
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
260
+0.11
0.02
1166
+0.10
0.02
1135
+0.10
0.01
Negative Logits
effe
-0.91
desir
-0.89
purcha
-0.89
increa
-0.88
alre
-0.87
thut
-0.86
oun
-0.86
nece
-0.85
oner
-0.84
fta
-0.83
POSITIVE LOGITS
quit
1.07
quit
0.91
quitting
0.88
Quit
0.85
Quit
0.84
quits
0.79
QUIT
0.66
QUIT
0.66
WriteTagHelper
0.63
<bos>
0.61
Activations Density 0.069%