INDEX
Explanations
phrases related to emergencies and actions taken in response
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.11
0.3%
453
+0.11
0.3%
946
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
946
+0.11
0.05
142
+0.11
0.03
1150
+0.10
0.02
Negative Logits
cushi
-0.78
cytok
-0.57
loamy
-0.56
créateur
-0.56
tupperware
-0.53
ennemi
-0.53
marchand
-0.53
anganronpa
-0.53
pageNo
-0.53
pixar
-0.52
POSITIVE LOGITS
<bos>
0.72
alerted
0.64
InjectMocks
0.62
disambiguazione
0.58
sizeCache
0.54
notified
0.54
principalTable
0.54
fromnode
0.53
SharedDtor
0.53
noticing
0.51
Activations Density 0.452%