INDEX
Explanations
linguistic elements associated with South Asian languages
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1343
+0.18
0.8%
169
+0.11
0.5%
1052
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1343
+0.18
0.03
689
+0.11
0.01
1052
+0.11
0.01
Negative Logits
<bos>
-1.40
nakalista
-0.79
contentLoaded
-0.77
intptr
-0.76
windowFixed
-0.74
isContained
-0.73
wireType
-0.73
fromnode
-0.73
Chham
-0.72
IsContent
-0.71
POSITIVE LOGITS
disagre
1.76
unlaw
1.71
affor
1.69
increa
1.67
maneu
1.61
inappro
1.59
impra
1.59
hairc
1.58
shenan
1.54
milf
1.54
Activations Density 0.030%