INDEX
Explanations
words related to specific identities or classifications
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
419
+0.28
1.6%
121
+0.16
0.9%
479
+0.15
0.9%
Correlated Neurons
Index
P. Corr.
Cos Sim.
419
+0.28
0.09
134
+0.16
0.07
66
+0.15
0.07
Negative Logits
noreply
-1.59
heartbeat
-1.49
saf
-1.42
roit
-1.36
Proced
-1.27
probable
-1.25
Fourth
-1.24
},\\
-1.23
amac
-1.22
nearby
-1.22
POSITIVE LOGITS
talks
1.71
uler
1.61
negotiations
1.54
discussions
1.49
ondo
1.49
apart
1.43
inking
1.43
anian
1.42
gard
1.39
uled
1.38
Activations Density 0.538%