INDEX
Explanations
personal stories or anecdotes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
0.7%
1614
+0.09
0.3%
86
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1988
+0.17
0.04
647
+0.09
0.04
1044
+0.08
0.05
Negative Logits
<bos>
-2.51
<?
-1.06
/**
-0.95
/*
-0.87
ⓧ
-0.82
-0.73
#
-0.69
HasColumnType
-0.65
HasAnnotation
-0.65
ngOnDestroy
-0.63
POSITIVE LOGITS
maneu
1.63
affor
1.54
impra
1.52
increa
1.52
emphat
1.49
guarante
1.48
disgra
1.48
desir
1.48
effe
1.48
!...
1.47
Activations Density 0.760%