INDEX
Explanations
words related to the concept of being wrong
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
1.0%
1548
+0.11
0.7%
1350
+0.09
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1548
+0.17
0.02
468
+0.11
0.02
208
+0.09
0.02
Negative Logits
<bos>
-3.51
-0.77
/*
-0.71
<?
-0.71
/***
-0.70
facilitate
-0.67
exp
-0.67
public
-0.67
establish
-0.67
utilize
-0.66
POSITIVE LOGITS
bandung
1.72
Minang
1.66
maroc
1.58
stockholm
1.56
jaya
1.50
lele
1.48
hcm
1.48
lidl
1.46
eiffel
1.45
bordeaux
1.43
Activations Density 0.053%