INDEX
Explanations
phrases related to communication and conversation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.14
0.4%
453
+0.11
0.3%
1533
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1675
+0.14
0.05
374
+0.11
0.03
753
+0.07
0.01
Negative Logits
apprehen
-1.14
oleo
-1.09
increa
-1.09
olx
-1.06
inev
-1.06
disagre
-1.04
hcm
-1.03
affor
-1.03
encomp
-1.03
susun
-1.02
POSITIVE LOGITS
answer
1.06
replied
1.01
answered
0.98
reply
0.95
Answer
0.90
answer
0.89
answers
0.89
answering
0.86
responded
0.84
respond
0.81
Activations Density 0.319%