INDEX
Explanations
phrases or sentences that include the word "who."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
231
+0.13
0.7%
500
+0.13
0.7%
39
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
203
+0.13
0.36
23
+0.13
0.31
494
+0.11
0.25
Negative Logits
»
-2.05
ŀ
-1.94
¿½
-1.91
ķ
-1.75
Clinic
-1.71
ľ
-1.62
Ľ
-1.60
Breast
-1.58
¤
-1.57
Ĺ
-1.55
POSITIVE LOGITS
asting
1.75
minded
1.52
foul
1.48
ismiss
1.46
ushes
1.45
SEE
1.45
uttle
1.39
atter
1.39
asted
1.37
aven
1.37
Activations Density 3.078%