INDEX
Explanations
Japanese names and terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.14
0.5%
1343
+0.08
0.3%
924
+0.07
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
227
+0.14
0.05
1343
+0.08
0.05
143
+0.07
0.04
Negative Logits
<bos>
-2.40
/*
-0.70
-0.66
<?
-0.65
onView
-0.60
govina
-0.60
exitRule
-0.59
дописавши
-0.58
quine
-0.57
apnews
-0.56
POSITIVE LOGITS
emphat
1.77
milf
1.77
madonna
1.76
affor
1.69
perfet
1.66
maneu
1.59
accla
1.58
stockholm
1.57
peppa
1.57
inev
1.55
Activations Density 0.157%