INDEX
Explanations
words related to health and medicine
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
68
+0.14
0.8%
47
+0.13
0.7%
178
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
68
+0.14
0.10
13
+0.13
0.05
271
+0.11
-0.02
Negative Logits
boy
-1.45
uu
-1.30
ľĵ
-1.30
missing
-1.30
cake
-1.28
Assumption
-1.27
yet
-1.23
still
-1.23
oh
-1.22
immun
-1.22
POSITIVE LOGITS
wikipedia
1.81
ensued
1.52
decades
1.50
centuries
1.50
oretic
1.45
otom
1.45
StackTrace
1.42
ETHOD
1.41
argument
1.33
cite
1.33
Activations Density 1.818%