INDEX
Explanations
references to specific entities such as people, publications, or organizations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
924
+0.09
0.3%
198
+0.08
0.2%
587
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
613
+0.09
0.06
1997
+0.08
0.04
227
+0.08
0.05
Negative Logits
Himo
-0.71
aarrggbb
-0.63
affitto
-0.63
stadio
-0.62
Ubicación
-0.62
exé
-0.62
preghi
-0.62
Composição
-0.61
Octobre
-0.60
osoba
-0.60
POSITIVE LOGITS
unspeak
0.71
intrigu
0.69
magazine
0.68
laug
0.68
Whence
0.66
outlander
0.64
glimp
0.64
kraken
0.63
horrend
0.63
gotcha
0.62
Activations Density 0.207%