INDEX
Explanations
the word "kind" or its variations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.14
0.6%
1222
+0.13
0.5%
1351
+0.12
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.14
0.04
1222
+0.13
0.04
1351
+0.12
0.03
Negative Logits
maksi
-0.68
roh
-0.63
stör
-0.55
zub
-0.54
FormBorderStyle
-0.53
oğlu
-0.53
lenmiş
-0.50
plak
-0.49
lü
-0.48
loh
-0.48
POSITIVE LOGITS
kind
1.08
KIND
1.05
kind
1.05
Kind
1.03
Kind
0.95
KIND
0.93
kinds
0.81
kinds
0.78
sort
0.75
Kinds
0.72
Activations Density 0.057%