INDEX
Explanations
information related to symptoms and effects of poisoning by a specific substance
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.13
0.4%
906
+0.12
0.4%
1842
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
736
+0.13
0.06
509
+0.12
0.05
1553
+0.11
0.05
Negative Logits
démoc
-0.82
!!</
-0.81
Violon
-0.79
lancia
-0.78
boulangerie
-0.78
pacchetto
-0.77
Tourisme
-0.76
azzurro
-0.74
triump
-0.74
Simult
-0.74
POSITIVE LOGITS
viewDid
0.54
symptoms
0.54
nausea
0.50
hallucinations
0.49
onStop
0.48
pylab
0.45
torchvision
0.45
loss
0.45
mici
0.45
smtplib
0.44
Activations Density 0.337%